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PREFACE 


This book provides an introduction to probability and mathematical statistics. 
Although the primary focus of the book is on a mathematical development of the 
subject, we also have included numerous examples and exercises that are oriented 
toward applications. We have attempted to achieve a level of presentation that is 
appropriate for senior-level undergraduates and beginning graduate students. 

The-second edition involves several major changes, many of which were sug- 

gested by reviewers and users of the first edition. Chapter 2 now is devoted to 
general properties of random variables and their distributions. The chapter now 
includes moments and moment generating functions, which occurred somewhat 
‘later. in. the first edition. Special ‘distributions have .been placed in Chapter 3. 
Chapter 8 is completely changed. It now considers sampling distributions and 
some basic properties of statistics. Chapter 15 is also new. It deals with regression 
and related aspects of linear models. 

As with the first edition, the only prerequisite for covering the basic material is 
calculus, with the lone exception of the material’ on general linear models in 
Section 15.4; this assumes some familiarity with matrices. This material can be 
omitted if so desired. 

Our intent was to produce a book that could be used as a textbook for a 
two-semester sequence in which the first semester is devoted to probability con- 
cepts and the second covers mathematical statistics. Chapters 1 through 7 include 
topics that usually are covered in a one-semester introductory course in probabil- 
ity, while Chapters 8 through 12 contain standard topics in mathematical sta- 
tistics. Chapters 13 and 14 deal with goodness-of-fit and nonparametric statistics. 
These chapters tend to be more methods-oriented. Chapters 15 and 16 cover 
material in regression and reliability, and these would be considered as optional 
or special topics. In any event, judgment undoubtedly will be required in the 


* 
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choice of topics covered or the amount of time allotted to topics if the desired 
material is to be completed in a two-semester course. 

It is our hope that those who use the book will find it both interesting and 
informative. 
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INTRODUCTION 


PROBABILITY 


I 


In any scientific study of a physical phenomenon, it is desirable to have a mathe- 
matical model that makes it possible to describe or predict the observed value of 
some characteristic of interest. As an example, consider the velocity of a falling 
body after a certain length of time, t. The formula v = gt, where g = 32.17 feet per 
second per second, provides a useful mathematical model for the velocity, in feet 
per second, of a body falling from rest in a vacuum. This is an example of a 
deterministic model. For such a model, carrying out repeated experiments under 
ideal conditions would result in essentially the same velocity each time, and this 
would be predicted by the model. On the other hand, such a model may not be 
adequate when the experiments are carried out under less than ideal conditions. 
There may be unknown or uncontrolled variables, such as air temperature or 
humidity, that might affect the outcome, as well as measurement error or other 
factors that might cause the results to vary on different performances of the 


I 
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experiment. Furthermore, we may not have sufficient knowledge to derive a more 
complicated model that could account for all causes of variation. 

There are also other types of phenomena in which different results may natu- 
rally occur by chance, and for which a deterministic model would not be appro- 
priate. For example, an experiment may consist of observing the number of 
particles emitted by a radioactive source, the time until failure of a manufactured 
component, or the outcome of a game of chance. 

The motivation for the study of probability is to provide mathematical models 
for such nondeterministic situations; the corresponding mathematical models will 
be called probability models (or probabilistic models). The term stochastic, which 
is derived from the Greek word stochos, meaning “guess,” is sometimes used 
instead of the term probabilistic. 

A careful study of probability models requires some familiarity with the nota- 
tion and terminology of set theory. We will assume that the reader has some 
knowledge of sets, but for convenience we have included a review of the basic 
ideas of set theory in Appendix A. 


NOTATION AND TERMINOLOGY 


The term experiment refers to the process of obtaining an observed result of some 
phenomenon. A performance of an experiment is called a trial of the experiment, 
and an observed result is called an outceme. This terminology is rather general, 
and it could pertain to such diverse activities as scientific experiments or games 
of chance. Our primary interest will be in situations where there is uncertainty 
about which outcome will occur when the experiment is performed. We will 
assume that an experiment is repeatable under essentially the same conditions, 
and that the set of all possible outcomes can be completely specified before 
experimentation. 


Definition 7.2.7 


The set of all possible outcomes of an experiment is called the sample space, denoted 


by S. 
Note that one and only one of the possible outcomes will occur on any given trial 
of the experiments. ; 


Example 71.2.7 An experiment consists of tossing two coins, and the observed face of each coin is 


of interest. The set of possible outcomes may be represented by the sample space 


S = {HH, HT, TH, TT} 


Example.1.2.2 


L 


Example 7.2.3 


Exampie 7.2.4 
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which simply lists all possible pairings of the symbols H (heads) and T (tails). An 
alternate way of representing such a sample space is to list all possible ordered 
pairs of the numbers 1 and 0, S = {(1, 1), (1, 0), (0, 1), (0, 0)}, where, for example, 
(1, 0) indicates that the first coin landed heads up and the second coin landed 
tails up. 


Suppose that in Example 1.2.1 we were not interested in the individual outcomes 
of the coins, but only in the total numberof heads obtained from the two coins. 
An appropriate sample space could then be written as S* = {0, 1, 2}. Thus, differ- 
ent sample spaces may be appropriate for the same experiment, depending on the 
characteristic of interest. 


If a coin is tossed repeatedly until a head occurs, then the natural sample space is 
S = {H, TH, TTH, ...}. If one is interested in the number of tosses required to 
obtain a head, then a possible sample space for this experiment would be the set 
of all positive integers, S* = {1, 2, 3, ...}, and the outcomes would correspond 
directly to the number of tosses required to obtain the first head. We will show in 
the next chapter that an outcome corresponding to.a.sequence of tosses in which 
a head is never.obtained need not.be included in the sample.space. 


A light bulb is placed in service and the time of operation until it burns out is 
measured. At least conceptually, the sample space for this experiment can be 
taken to be the set of nonnegative real numbers, § = {t|0 <t < oo}. 

Note that ifthe actual failure time could be measured only to the nearest.hour, 
then the sample space for the actual observed failure. time-would be the set of 


_. nonnegative integers, S* = {0, 1, 2, 3, ...}. Even though $* may be the observable 


sample. space, one might prefer to. describe the properties.and behavior of light 
bulbs in terms of the conceptual sample space S. In cases of this type, the dis- 
creteness imposed by measurement limitations is sufficiently negligible that it can 
be ignored, and both the measured response and the conceptual response can be 
discussed relative to the conceptual sample space S. 


A sample space. S is said.to. be finite if it consists of a finite number of out- 
comes, say S = {é;, @2, ..., éy}, and it is said to be countably infinite if its out- 
comes can be put into a one-to-one correspondence with the positive integers, say 
S = fees a0: " 


Definition 7.2.2 


If a sample space S is either finite or countably infinite, then it is called a discrete 
sample space. 


Example 7.2.5 


Example 71.2.6 
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A set that is either finite or countably infinite also is said to be countable. This 
is the case in the first three examples. It is also true for the last example when 
failure times are recorded to the nearest hour, but not for the conceptual sample 
space. Because the conceptual space involves outcomes that may assume any 
value in some interval of real numbers (i.¢., the set of nonnegative real numbers), 
it could be termed a continuous sample space, and it provides an example where a 
discrete sample space is not an appropriate model. Other, more complicated 
experiments exist, the sample spaces of which also could be characterized as con- 
tinuous, such as experiments involving two or more continuous responses. 


Suppose a heat lamp is tested and X, the amount of light produced (in lumens), 
and Y, the amount of heat energy (in joules), are measured. An appropriate 
sample space would be the Cartesian product of the set of all nonnegative real 
numbers with itself; 


S = [0, 00) x [0, 0) = {(x, WJ0<x< oo and 0<y<o} 


Each variable would be capable of assuming any value in some subinterval of 
[0, 00). 

Sometimes it is possible to determine bounds on such physical variables, but 
often it is more convenient to consider a conceptual model in which the variables 
are not bounded. If the likelihood of the variables in the conceptual model 
exceeding such bounds is negligible, then there is no practical difficulty in using 
the. conceptual model. 


A thermograph is a machine that records temperature continuously by tracing a 
graph on a roll of paper as it moves through the machine. A thermographic 
recording is made during a 24-hour period. The observed result is the graph ofa 
continuous real-valued function f(t) defined on the time interval [0, 24] 
= {t|0 <t < 24}, and an appropriate sample space would be a collection of such 
functions. 


Definition 71.2.3 


An eyent is a subset of the sample space S. If A is an event, then A has occurred if it 
contains the outcome that occurred. 


To illustrate this concept, consider Example 1.2.1. The subset 
A = {HH, HT, TH} 


contains the outcomes that correspond to the event of obtaining “at least one 
head.” As mentioned earlier, if one of the outcomes in A occurs, then we say that 
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the event A has occurred. Similarly, if one of the outcomes in B = {HT, TH, TT} 
occurs, then we say that the event “at least one tail” has occurred. 

Set notation and terminology provide a useful framework for describing the 
possible outcomes and related physical events that may be of interest in an 
experiment. As suggested above, a subset of outcomes corresponds to a physical 
event, and the event or the subset is said to occur if any outcome in the subset 
occurs. The usual set operations of union, intersection, and complement provide 
a way of expressing new events in terms of events that already have been defined. 
For example, the event C of obtaining “at least. one head and at least one tail” 
can be expressed as the intersection of A and B, C= Am B= {HT, TH}. Simi- 
larly, the event “at least one head or at least.one tail” can be expressed as the 
union A U B= {HH, HT, TH, TT}, and the event “no heads” can be expressed 
as the complement of A relative to S, A’ = {TT}. 

A review. of set notation and terminology is given in Appendix A. 

In general, suppose.S is the sample.space for some experiments, and that A and 
B are-events. The intersection A n B represents the outcomes of the event “A and 
B,” while the union A U B represents the event “A or B.”. The complement A’ 
corresponds to the event “not A.” Other events also can be represented in terms 
of intersections, unions, and complements. For example, the event “A but not B” 
is said to occur if the outcome of the experiment belongs to _A m B’, which some- 
times is written as A — B. The event “exactly one of A or B” is said to occur if the 
outcome. belongs to (A 9 BY) U (A’ - B). The set A’ - B’ corresponds to the 
event. “neither A nor.B.” The set identity A’ a B’ = (AU By is another way to 
represent this event. This is one of the set properties that usually are referred to 
as De Morgan’s laws. The other such property is A’ U B’ = (A - BY. 


More generally, if Aj,....; A, is.a finite collection of events, occurrence of an 
k 
outcome in the intersection A, 4°: m A, (or O A;) corresponds to the 
f=1000 ; 
occurrence of the event “every A;; i = 1, .::,k.” The occurrence of an outcome in 
k 


the union A, U.-:: U A, (or |) A,) corresponds to the occurrence of the event 
i=i 


“at least one A;; i= 1,..., k.” Similar remarks apply in the case of a countably 
fo 0) 

infinite collection A,, A,,..., with the notations Ay Nn A, 9 °-: (or (\ A) for 
i=l 


co) 
the intersection and A, U A, U-:: (or |) Aj) for the union. 
. Tl 


The intersection (or union) of a finite or countably infinite collection of events 
is called a countable intersection (or union). 

We will consider the whole sample space S as a special type of event, called the 
sure event, and we also will include the empty set @ as an event, called the null 
event. Certainly, any set consisting of only a single outcome may be considered as 
an event. 


CHAPTER 1 PROBABILITY 


Definition 1.2.4 


An_-event is called an elementary event if it contains exactly one outcome of the 
experiment. 


In a discrete sample space, any subset can be written as a countable union of 
elementary events, and we have no difficulty in associating every subset with an 
event in the discrete case. 

In Example 1.2.1, the elementary events are {HH}, {HT}, {TH}, and {TT}, and 
any other event can be written as a finite union of these elementary events. Simi- 
larly, in Example 1.2.3, the elementary events are {H}, {TH}, {TTH}, ..., and any 
event can be represented as a countable union of these elementary events. 

It is not as easy to represent events for the continuous examples. Rather than 
attempting to characterize these events rigorously, we will discuss some examples. 

In Example 1.2.4, the light bulbs could fail during any time interval, and any 
interval of nonnegative real numbers would correspond to an interesting event 
for that experiment. Specifically, suppose the time until failure is measured in 
hours. The event that the light bulb “survives at most 10 hours” corresponds to 
the interval A = [0, 10] = {t|0 <t < 10}. The event that the light bulb “survives 
more than 10 hours” is A’ = (10, 0) = {t|10<t< oo}. If B=[0, 15), then 
C=B no A’ = (10, 15) is the event of “failure between 10 and 15 hours.” 

In Example 1.2.5, any Cartesian product based on intervals of nonnegative real 
numbers would correspond to an event of interest. For example, the event 


(10, 20) x [5, 00) = {(x, y)] 10< x <20 and 5<y< oo} 


corresponds to “the amount of light is between 10 and 20 lumens and the amount 
of energy is at least 5 joules.” Such an event can be represented graphically as a 
rectangle in the xy plane with sides parallel to the coordinate axes. 

In general, any physical event can be associated with a reasonable subset of S, 
and often a subset of S can be associated with some meaningful event. For math- 
ematical reasons, though, when defining probability it is desirable to restrict the 
types of subsets that we will consider as events in some cases. Given a collection 
of events, we will want any countable union of these events to be an event. We 
also. will want complements of events and countable intersections of events to be 
included in the collection of subsets that are defined to be events. We will assume 
that the collection of possible events includes all such subsets, but we will not 
attempt to describe all subsets that might be called events. 

An important situation arises in the following developments when two events 
correspond to disjoint subsets. 


Definition 7.2.5 


Two events A and B are called mutually exclusive if A a B= ©. 


Example 7.2.7 
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If events are mutually exclusive, then they have no outcomes in common. Thus, 
the occurrence of one event precludes the possibility of the other occurring. In 
Example 1.2.1, if A.is the event “at least one head” and if we let B be the event 
“both tails,’ then A and B are. mutually exclusive. Actually, in this example 
B= A’ (the complement of A). In general, complementary events are mutually 
exclusive, but the converse is not true. For. example, if Cis the event “both 
heads,” then B and C are mutually exclusive, but not complementary. 

The notion of mutually exclusive events can be extended easily to more than 
two events. 


Definition 7.2.6 


Events A,, Az, A3,..., are said to be mutually exclusive if they are pairwise 
mutually exclusive. That is, if A; 7 A; = @ whenever i #4j. 


One possible approach to assigning probabilities to events involves the notion 
of relative frequency. 


RELATIVE FREQUENCY 


For the experiment of tossing a coin, we may declare that the probability of 
obtaining a head is 1/2. This could be interpreted in terms of the relative fre- 
quency with which a head is obtained on repeated tosses. Even though the coin 
may be tossed only once, conceivably it could be tossed many times, and experi- 
ence leads us to expect a head on approximately one-half of the- tosses. At least 
conceptually, as the number.of tosses approaches infinity, the proportion of times 


-.a head occurs is expected to converge to some constant p. One then might define 


the probability of obtaining a head to be.this conceptual limiting value. For a 
balanced coin, one. would expect.p = 1/2, but if the coin is unbalanced, or if the 
experiment is conducted under unusual conditions that tend to bias the outcomes 
in favor of either heads or tails, then this assignment would not be appropriate. 

More generally, if m(A) represents the number of times that the event A occurs 
among M trials of a given experiment, then {, = m(A)/M represents the relative 
frequency of occurrence of A on these trials of the experiment. 


An experiment consists of rolling an ordinary six-sided die. A natural sample 
space is the set of the first six positive integers, S = {1, 2, 3, 4, 5, 6}. A simulated 
die-rolling experiment is performed, using a “random number generator” on a 
computer. In Figure 1.1, the relative frequencies of the elementary events 
A, = {1}, A, = {2}, and so on are represented as the heights of vertical lines. The 
first graph shows the relative frequencies for the first M = 30 rolls, and the 
second graph gives the results for M = 600 rolls. By inspection of these graphs, 


FIGURE 1.7 
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obviously the relative frequencies tend to “stabilize” near some fixed value as M 
increases. Also included in the figure is a dotted line of height 1/6, which is the 
value that experience would suggest as the long-term relative frequency of the 
outcomes of rolling a die. Of course, in this example, the results are more relevant 
to the properties of the random number generator used to simulate the experi- 
ment than to those of actual dice. 


Relative frequencies of elementary events for die-rolling experiment 


107/600 103/600 99/600 
96/600» 98/600 » 


(M = 30) : (M = 600) 


If, for an event A, the limit of f, as M approaches infinity exists, then one could 

assign probability. to A by 
P(A) = lim f, (1.2.1) 
M>o@ 

This expresses.a property known as statistical regularity. Certain technical 
questions about this property require further discussion. For example, it is not 
clear under what conditions the limit in equation (1.2.1) will exist, or in what 
sense, or whether it will necessarily be the same for every sequence of trials. Our 
approach to this problem will be to define probability in terms of a set of axioms 
and eventually show that the desired limiting behavior follows. 

To motivate the defining axioms of probability, consider the following proper- 
ties of relative frequencies. If S is the sample space for an experiment and 4 is an 
event, then clearly 0 < m(A) and m(S) = M, because m(A) counts the number of 
occurrences of A, and S§ occurs on each trial. Furthermore, if A and B are 
mutually exclusive events, then outcomes in A are distinct from outcomes in B, 
and consequently m(A U B) = m(A) + m(B). More generally, if A,, Az, ... are 
pairwise mutually exclusive, then m(A, U A, U+*:)= m(A,) + mA) +°°°. 
Thus, the following properties hold for relative frequencies: 


O<f, (1.2.2) 
fs=1 (1.2.3) 
tea ot Li hatte (1.2.4) 


if A,, A,,... are pairwise mutually exclusive events. 
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Although the relative frequency approach may not always be adequate as a 
practical method of assigning probabilities, it is the way that probability usually 
is interpreted. However, many people consider this interpretation too restrictive. 
By regarding probability as a subjective measure of belief that an event will occur, 
they are willing to assign probability in any situation involving uncertainty 
without assuming properties such as repeatability or statistical regularity. Sta- 
tistical methods based on both the relative frequency approach and the subjective 
approach will be discussed in later chapters. 


1.3 


DEFINITION OF PROBABILITY 


Given an experiment with an associated sample space S, the primary objective of 
probability modeling is to assign to each event A a real number P(A), called the 
probability of A, that will provide a measure of the likelihood that A will occur 
when the experiment is performed. 

Mathematically, we can think of P(A) as a set function. In other words, it is a 
function whose domain is a collection of sets (events), and the range of which is a 
subset of the real numbers, oe 

Some set functions are not suitable for assigning probabilities to events. The 
properties given in the following definition are motivated by similar properties 
that hold for relative frequencies. 


Definition 7.3.7 


For a given experiment, S denotes the sample space and A, A,, A,, ... represent 
possible events. A set function that associates a real value P(A) with each event A is 
called a probability set function, and P(A) is called the probability of A, if the follow- 


ing properties are satisfied: « 


0 < P(A) for every A (1.3.1) 

P(S)=1 (1.3.2) 

of U) A\) ay P(A;) (1.3.3) 
i=1 i=1 : 


if A,, A,,.., are pairwise mutually exclusive events. 


These properties all seem to agree with our intuitive concept of probability, 
and these few properties are sufficient to allow a mathematical structure to be 
developed. 

One consequence of the properties is that the null event (empty set) has prob- 
ability zero, P(@) = 0 (see Exercise 11). Also, if A and B are two mutually exclu- 
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sive events, then 
P(A U B) = P(A) + P(B) (1.3.4) 


Similarly, if A,,;A,,..., 4; is a finite collection of pairwise mutually exclusive 
events, then 


P(A, VU A, U ++: U A,) = P(A) + P(A) +7: + P(A) (1.3.5) 


(See Exercise 12.) 

In the case of a finite sample space, notice that there is at most a finite number 
of nonempty mutually exclusive events. Thus, in this case it would suffice to 
verify equation (1.3.4) or (1.3.5) instead of (1.3.3). 


The successful completion of a construction project requires that a piece of 
equipment works properly..Assume that either. the “project succeeds” (A,) or it 
fails because of one and only one of the following: “mechanical failure” (A,) or 
“electrical failure” (A3). Suppose that mechanical failure is three times as likely as 
electrical failure, and successful. completion is twice.as likely as mechanical 
failure. The. resulting assignment. of probability is determined by the equations 
P(A,) = 3P(A;). and. P(A,) =.2P(A,). Because one and only. one of these events 
will occur, we also have from (1.3.2) and (1.3.5) that -P(A,) +.P(A,) + P(A;) = 1. 
These equations provide .a system that.can be. solved simultaneously to obtain 
P(A,) = 0.6, P(A,) = 0.3, and P(A3).=.0.1.. The event. “failure” is represented by 
the union A, U Aj, and because. A, and A, are assumed.to be mutually exclu- 
sive, we have from equation (1.3.5) that the probability of failure is P(A, U As) 
=034+01 =04. 


PROBABILITY IN DISCRETE SPACES 


The assignment of probability in the case of a discrete sample space can be 
reduced to assigning probabilities to the elementary events. Suppose that to each 
elementary event {e,} we assign a real number p,, so that P({e,}) = p,. To satisfy 
the conditions of Definition 1.3.1, it is necessary that 


p;20 for alli (1.3.6) 
Ya=1 (1.3.7) 
i 


Because each term in the sum (1.3.7) corresponds to an outcome in S, it is an 
ordinary summation when S.is finite, and an infinite series when S is countably 
infinite. The probability of any other event then can. be determined from the 
above assignment by representing the event as a union of mutually exclusive 
elementary events, and summing the corresponding values of p;. A concise nota- 
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tion for this is given by 
P(A) = Y P({ei}) (1.3.8) 


eed 
With this notation, we understand that the summation is taken over all indices i 
such that e; is an outcome in A. This approach works equally well for both finite 
and countably infinite sample spaces, but if A is a countably infinite set the sum- 
mation in (1.3.8) is actually an infinite series. 


If two coins are tossed as in Example 1.2.1, then S = {HH, HT, TH, TT}; if the 
coins are balanced, it is reasonable to assume that each of the four outcomes is 
equally likely. Because P(S) = 1, the probability assigned to each elementary 
event must be 1/4..Any event in a-finite sample space can be written as a finite 
union of distinct elementary. events, so.the. probability of any event is a sum 
including the.constant term. 1/4 for.each.elementary event in the union. For 
example, if C = {HT, TH} represents the event “exactly one head,” then 


P(C) = P({HT}) + P({TH}) = 1/4 + 1/4 = 1/2 


Note that the “equally likely” assumption cannot be applied indiscriminately. 
For example, in Example 1.2.2 the number of heads is of interest, and the sample 
space is S* = {0, 1, 2}. The elementary event {1} corresponds to the event 
C = {HT, TH} in S. Rather than assigning the probability 1/3 to the outcomes in 
S*, we should assign P({1}) = 1/2 and P({0}) = P({2}) = 1/4. 


In many problems, including those involving games of chance, the nature of 
the outcomes dictates the assignment of equal probability to each elementary 


- event. This type of model sometimes is referred to as the classical probability 


model. 


CLASSICAL PROBABILITY 


Suppose that a finite number of possible outcomes may occur in an experiment, 
and that it is reasonable to assume that each outcome is equally likely to occur. 
Typical problems involving games of chance—such as tossing a coin, rolling a 
die, drawing cards from a deck, and picking the winning number in a lottery—fit 
this description. Note that the “equally likely” assumption requires the experi- 
ment to be carried out in such a way that the assumption is realistic. That is, the 
coin should be balanced, the die should not be loaded, the deck should be shuf- 


fied, the lottery tickets should be well mixed, and so forth. 
This imposes a very special requirement on the assignment of probabilities to 


the elementary outcomes. In particular, let the sample space consist of N distinct 
outcomes, 


S = {e,, €2,..., ey} (1.3.9) 
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The “equally likely” assumption requires of the values p; that 


Py = P2 = "°° = Pw (1.3.10) 
and, to satisfy equations (1.3.6) and (1.3.7), necessarily 
D; = P({e;}) = x (1.3.11) 


In this case, because all terms in the sum (1.3.8) are the same, p; = 1/N, it 
follows that 
n(A) 


P(A) = (4.3.12) 


where n(A) represents the number of outcomes in A. In other words, if the out- 
comes of an experiment are equally likely, then the problem of assigning prob- 
abilities to events is reduced to counting how: many outcomes are favorable to 
the occurrence of the event as well as how many are in the sample space, and 
then finding the ratio. Some techniques that will be useful in solving some of the 
more complicated counting problems will be presented in Section 1.6. 

The formula presénted in (1.3.12) sometimes is referred to as classical probabil- 
ity. For problems in which this method of assignment is appropriate, it is fairly 
easy to show that our general definition of probability is satisfied. Specifically, for 
any event A, 


Pd) ="2 50 
p= a1 
P(A U B) = Mae Be mabe mB) _ P(A) + PCB) 


if A and B are mutually exclusive. 


RANDOM SELECTION 


A major application of classical probability arises in connection with choosing an 
object or a set of objects “at random” from a collection of objects. 


Definition 7.3.2 


If.an object is chosen from a finite collection of distinct. objects in such a manner 


that each object has the same probability of being chosen, then we say that the 
object was chosen at random. 


less 
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Similarly, if a subset of the objects is chosen so that each subset of the same 
size has the same probability of being chosen, then we say that the subset was 
chosen at random. Usually, no distinction is made when the elements of the 
subset are listed in a different order, but occasionally it will be useful to make this 
distinction. 


A game of chance involves drawing a card from an ordinary deck of 52.playing 
cards. It should not matter whether the card comes from the top or some other 
part of the deck if the cards are well shuffled. Each card-would have the same 
probability, 1/52, of being selected. Similarly, if a game involves drawing five 
cards, then it should not matter whether the top five cards or any other five cards 
are drawn. The probability assigned to each possible set of five cards would be 
the reciprocal of the total number of subsets of size 5 from a set of size 52. In 
Section 1.6 we will develop, among other things, a method for counting the 
number of subsets of a given size. 


SOME PROPERTIES OF PROBABILITY 


Theorem 7.4.7 


From general properties of sets and the properties of Definition 1.3.1 we can 
derive other useful properties of probability. Each of the following theorems per- 
tains to one or more events relative to the same experiment. 


If A is an event and A’ is its complement, then 


P(A) = 1— P(A’) (1.4.1) 


Proof 


Because A’ is the complement of A relative to S,.S=Av A’. Because 
Ao A' =, Aand A’ are mutually exclusive, so it follows from equations (1.3.2) 
and (1.3.4) that : 


1 = P(S)= P(A v A’) = P(A) + P(A) 
which established the theorem. & 


This theorem is particularly useful when an event A is relatively complicated, 
but its complement A’ is easier to analyze. 


id 
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An experiment consists of tossing a coin four times, and the event A of interest is 
“at least one head.” The event A contains most of the possible outcomes, but the 
complement, “no heads,” contains only one, A’ = {TTTT}, so n(4’)) = 1. It can be 
shown by listing all of the possible outcomes that n(S) = 16, so that P(A’) 
= n(A’)/n(S) = 1/16, Thus, P(A) = 1 — P(4’) = 1 — 1/16 = 15/16. 


For any event A, P(A) < 1. 


Proof 


From Theorem (1.4.1), P(A) = 1 — P(A’). Also, from Definition (1.3.1), we know 
that P(A’).> 0. Therefore, P(A) < 1. 


Note that this theorem combined with Definition (1.3.1) implies that 
0<P(A)<1 (1.4.2) 


Equations (1.3.3), (1.3.4), and (1.3.5) provide formulas for the probability of a 
union in the case of mutually exclusive events. The following theorems provide 
formulas that apply more generally. 


For any two events A and B, 


P(A u B)= P(A) + P(B) — P(A 2 B) (1.4.3) 


Proof 


The approach will be to express the events A U B and A as unions of mutually 
exclusive events. From set properties we can show that 


AUB=(ANB)UB 
and 
A=(AO B)U(An B) 


See Figure 1.2 for an illustration of these identities. 


Partitioning of events 


AUB = (ANB’)UB A = (ANB)U(ANB’) 


ee 
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It also follows that the events A B’ and Bare mutually exclusive because 
(A:n Bn B= @, so that equation (1.3.4) implies 


P(A Uv B) = P(A 2B) + P(B) 
Similarly, A 4 Band A - B’ are mutually exclusive, so that 
P(A) = P(A B)+ P(A TB) 
The theorem follows from these equations: 
P(A U B) = P(A - B) + P(B) 
= [P(A) — P(A 7 B)] + P(B) 
= P(A) + P(B) — P(A 2 B) 


Suppose one card is drawn at random from an ordinary deck of 52 playing cards. 
As noted in Example 1.3.3, this means that each card has the same probability, 
1/52, of being chosen. 

Let A be the event of obtaining “a red ace” and let B be the event “a heart.” 
Then P(A) =.2/52, P(B) = 13/52, and P(A 2B) = 1/52. From Theorem (1.4.3) we 
have P(A U B) = 2/52 + 13/52 — 1/52 = 14/52 = 7/26. 


Theorem 1.4.3 can be extended easily to three events. 


For any three events A, B, and C, 
P(A UV BUC) = P(A) + P(B) + P(C) 
—P(A rn B)—P(ANC)—P(IBAC) 
+P(AN BNC) (1.4.4) 


Proof 


See Exercise 16. 


It is intuitively clear that if every outcome of A is also an outcome of B, then A 
is no more likely to occur than B. The next theorem formalizes this notion. 


If A cB, then P(A) < P(B). 


Proof 


See Exercise 17. B 
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Property (1.3.3) provides a formula for the probability of a countably infinite 
union when the events are mutually exclusive. If the events are not mutually 
exclusive, then the right side of property (1.3.3) still provides an upper bound for 
this probability, as shown in the following theorem. 


Boole’s Inequality If A,, A,,... is a sequence of events, then 


( U A) < ) P(A) (1.4.5) 
i=1 i=1 


Proof 


=. , 
Let B, = A,, B, = A, 7 Aj, and in general B; = A; 9 (U A). Tt follows that 
j=i 


co 0] ga 
|) A; = () B; and B,, B,, ... are mutually exclusive. Because B; < A,, it follows 
i=l i=1 


from Theorem 1.4.5 that P(B;) < P(A)), and thus 


o( Ua) -»(Ua) = ¥ P(B) < ¥) P(A) 
i=1 i=1 i=1 i=1 
A similar result holds for finite unions. In particular, 


P(A, U A, U+++ U A,) < P(A,) + P(A.) + °° + P(A, (1.4.6) 


which can be shown by a proof similar to that of Theorem 1.4.6. 


Bonferroni’s Inequality If A,, A,,..., A, are events, then 
k k 
of a A) 21— ) P(A) (1.4.7) 
i=1 f=1 


Proot 


k 


k t 
This follows from Theorem 1.4.1 applied to (\a=(Uai). together with 
i i=1 


=] 


inequality (1.4.6). @ 


CONDITIONAL PROBABILITY 


A major objective of probability modeling is to determine how likely it is that an 
event A will occur when a certain experiment is performed. However, in numer- 
ous cases the probability assigned to A will be affected by knowledge of the 


— 


Example 71.5.7 


TABLE 7.4 


1.5 CONDITIONAL PROBABILITY 17 


occurrence or nonoccurrence of another event B. In such an example we will use 
the terminology “conditional probability of A given B,” and the notation P(A| B) 
will be used to distinguish between this new concept and ordinary probability 
P(A). 


A box contains 100 microchips, some of which were produced by factory 1 and 
the rest by factory 2. Some of the microchips are defective and some are good 
(nondefective). An experiment consists of choosing one microchip at random 
from the box and testing whether it is-good or defective. Let A be the event 
“obtaining a defective microchip”; consequently, A’ is the event “obtaining a 
good microchip.” Let B be the event “the microchip was produced by factory 1” 
and B’ the event “the microchip was produced by factory 2.” Table 1.1 gives the 
number of microchips in each category. 


Numbers of defective and 
nondefective microchips 
from two factories 


8 B’ Totais 


A 15 5 20 
A’ 45 35 80 


Totals 60 40 100 


The probability of obtaining a defective microchip is 


Now suppose that each microchip has a number stamped on it that identifies 
which factory produced it. Thus, before testing whether it is defective, it can be 
determined whether B has occurred (produced by factory 1) or B’ has occurred 
(produced by factory 2). Knowledge of which factory produced the microchip 
affects the likelihood that a defective microchip is selected, and the use of condi- 
tional probability is appropriate. For example, if the event B has occurred, then 
the only microchips we should consider are those in the first column of Table 1.1, 
and the total number is n(B) = 60. Furthermore, the only defective chips to con- 
sider are those in both the first column and the first row, and the total number is 
n(A mB) = 15. Thus, the conditional probability of A given B is 
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Notice that if we divide both the numerator and denominator by n(S) = 100, we 
can express conditional probability in terms of some ordinary unconditional 
probabilities, 

n(A oO BYn(S) P(A OB) . 


PAB) =~" Byn(S) PB) 


This last result can be derived under more general circumstances as follows. 
Suppose we conduct an experiment with a sample space S,.and suppose we are 
given that the event B has occurred. We wish to know the probability that an 
event A has occurred given that B has occurred, written P(A |B). That is, we want 
the probability of A relative to the reduced sample space B. We know that B can 
be partitioned into two subsets, 
B=(Ao B)u (A’o B) 
Ao B is the subset of B for which A is true, so the probability of A given B 
should be proportional to P(A mB), say P(A|B)=kP(A 2 B).. Similarly, 
P(A'| B) = kP(A’ > B). Together these should represent the total probability rela- 
tive to B, so 
P(A| B) + P(A'| B) = k[P(A-- B) + P(A’ B)] 

=kP[(A 9B) U(A'.o B)] 

= kP(B) 

= 1 
and k = 1/P(B). That is, 

P(A - B) _ P(A.-2 B) 

P(A 0 B)+ P(A'O B)— P(B) 
and 1/P(B) is the proportionality constant that makes the probabilities on the 
reduced sample space add to 1. 


P(A|B) = 
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The conditional probability of an event A, given the event B, is defined by 


P(A 1 B) 


P(A|B) = PB) 


if P(B) #0. 


Relative to the sample space B, conditional probabilities defined by (1.5.1) 
satisfy the original definition of probability, and thus conditional probabilities 
enjoy all the usual properties of probability on the reduced sample space. For 
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example, if two.events A, and A, are mutually exclusive, then 


P[(A, U A2) 7 Bl 
P(B) 


ae PU(A, 9B) U (A, 9 B)] 
5. P(B) 


_ P(A, 0 B) + P(4a 2B) 
= P(B) 


= P(A,|B) + P(A2|B) 


P(A, VU A2|B) = 


This result generalizes to more than. two. events. Similarly, P(4|B) 20 and 
P(S| B) = P(B| B) = 1, so the conditions of a probability set function are satisfied. 
Thus, the properties derived in Section 1.4 hold conditionally. In particular, 


P(A|B) = 1 — P(A’|B) (1.5.2) 
0< P(A|B)<i (1.5.3) 
P(A, U A,|B) = P(A, |B) + P(A,| B) — P(A, 7 A,|B) (1.5.4) 


The following theorem results immediately from equation (1.5.1): 


For any events A and B, 


P(A 0 B) = P(B)P(A|B) = P(A)P(B| A) (1.5.5) 


This sometimes is referred to as the Multiplication Theorem of probability. It 
provides a way to compute the probability of the joint occurrence of A and B by 
multiplying the probability of one event and the conditional probability of the 
other event. In terms of Example 1.5.1, we can compute directly P(A 7 B) 
= 15/100 = 0.15, or we can compute it as  P(B)P(A|B) = (60/100)(15/60) 
= 0.15 or P(A)P(B| A) = (20/100)(15/20) = 0.15. 

Formula (1.5.5) also is quite useful in dealing with problems involving sampling 
without replacement. Such experiments consist of choosing objects one at a time 
from a finite collection, without replacing chosen objects before the next choice. 
Perhaps the most common example of this is dealing cards from a deck. 


Two cards are drawn without replacement froma deck of cards. Let A, denote 
the event.of getting “an ace on the first draw” and A, denote the event of getting 
“an ace on the second draw.” 

The number of ways in which different outcomes can occur can be enumerated, 
and-the results are given in Table 1.2. The.enumeration of possible outcomes can 
be.a tedious problem, and useful. techniques.that are helpful in such counting 
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problems are discussed in Section 1.6. The values in this example are based on 
the so-called multiplication principle, which says that if there are n, ways of 
doing one thing and n, ways of doing another, then there are n, - n, ways of 
doing both. Thus, for example, the total number of ordered two-card hands that 
can be formed from 52 cards (without replacement) is 52 - 51 = 2652. Similarly, 
the number of ordered two-card hands in which both cards are aces is 4 - 3, the 
number in which the first card is an ace and the second is not an ace is 4 - 48, and 
so forth. The appropriate products for all cases are provided in Table 1.2. 


Partitioning the numbers 
of ways to draw two cards 


A, 


For example, the probability of getting “an ace on the first draw and an ace on 
the second draw” is given by 


4-3 
52°51 
Suppose one is interested in P(A,) without regard to what happens on the second 
draw. First note that A, may be partitioned as 


4A, = (A, 9 A) VU (A, 9 Ad) 


P(A, 0 Aa) = 


so that 
P(A,) = P(A, 9. Az) + P(A, 9A) 
4-3 4.48 
= + rE 
52:51 52:51 
= 4-51 ahs 
cee SOY any) 


This same result would have occurred if A, had been partitioned by another 
event, say B, which deals only with the face value of the second card. This follows 
because n(B U B’) = 51, and relative to the 52 - 51 ordered pairs of cards, 

n(A,) =4.:n(B) 4+ 4. n(B)=4-n(B vu B)=4- 51 


The numerators of probabilities such as P(A,), P(A‘), P(A2), and P(A‘), which 
deal with only one of the draws, appear in the margins of Table 1.2. These prob- 
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abilities may be referred to as marginal probabilities. Note that the marginal 
probabilities in fact can be computed directly from the original 52-card sample 
space, and it is not necessary to consider the sample space of ordered pairs at all. 
For example, P(A,) = 4: 51/52 - 51 = 4/52, which is the probability that would 
be obtained for one draw from the original 52-card sample space. Clearly, this 
result would apply to sampling-without-replacement problems in general. What 
may be less intuitive is that these results also apply to marginal probabilities such 
as P(A;), and not just to. the outcomes on the first-draw. That is, if the outcome of 
the first draw is not known, then P(A,) also can be computed from the original 
sample space and is given by P(A,) = 4/52. This can be verified in this example 
because 


A, =(A2 9 Ay) U (Az 2 A}) 


and 
435 AB 
PA) = 51 +5003] 
es 
- 52 


Indeed, if the result of the first draw is not known, then the second draw could 
just as well be considered _as the first draw. 

The conditional probability that an ace is drawn on the second draw given 
that an ace was obtained.on the first draw is 


P(A, A A2) 
P(A) 

_ (4+ 352 - 51) 

~ (4+ SL(52 + 51) 


3 


1 


P(A2| Ai) = 


That is, given that A, is true, we are restricted to the first column of Table 1.2, 
and the relative proportion of the time that A, is true on the reduced sample 
space is (4 - 3)/[(4-.3+.(4.- 48)]. Again, it may be less obvious, but it is possible 
to carry this problem one step further and compute P(A, | A,) directly in terms of 
the 51-card conditional sample space, and obtain the much simpler solution that 
P(A,|A;) = 3/51, there being three aces remaining in the 51 remaining cards in 
the conditional sample space. Thus, it is common practice in this type of problem 
to compute the conditional probabilities and marginal probabilities directly from 
the one-dimensional ‘sample spaces (one marginal and one conditional space), 
rather than obtain the joint probabilities from the joint sample space of ordered 
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pairs. For example, 
P(A, A Ag) = P(Ay)P(A2| Ay) 


ea 2. 
52 51 


This procedure would extend to three or more draws (without replacement) 
where, for example, if A, denotes obtaining “an ace on the third draw,” then 


P(A, 9 Az 0 A3) = P(A,)P(A2| Ay)P(43/ 41.9 A.) 
4 3 2 


An indication of the general validity of this approach for computing condition- 
al probabilities is obtained by considering P(A,|A,) in the example. Relative to 
the joint sample space of ordered pairs, 204 = 4-51, where 4 represents the 
number of ways the given event A,‘can occur on the first draw and 51 is the total 
number of possible outcomes in the conditional sample space for the second 
draw; also, 12 = 4-3 represents the number of ways the given event A, can 


‘occur times the number of ways a success, A,, can occur in the conditional 


sample space. Because the number of ways A, can occur is a common multiplier 
in the numerator and denominator when counting ordered pairs, one may equiv- 
alently count directly in the one-dimensional conditional space associated with 
the second draw. 

The computational advantage of this approach is obvious, because it allows 
the computation of the probability of an event in a complicated higher- 
dimensional product space as a product. of probabilities, one marginal and the 
others conditional, of events in simpler one-dimensional sample spaces. 


The above discussion is somewhat tedious, but it may provide insight into the 
physical meaning of conditional probability and marginal probability, and also 
into the topic of sampling without replacement, which will come up again in the 
following sections. 


TOTAL PROBABILITY AND BAYES’ RULE 


As noted in Example 1.5.2, it sometimes is useful to partition an event, say A, into 
the union of two or more mutually exclusive events. For example, if B and B’ are 
events that pertain to the first draw from a deck, and if A is an event that 
pertains to the second draw, then it is worthwhile to consider the partition 
A=(An B)U(A 2 B)tocompute P(A), because this separates A into two events 
that involve information about both draws. More generally, if B,, B,,..., By 
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are mutually exclusive and exhaustive, in the sense that B, U B,U-:: UB, 
= §, then 


A=(AN By) U(AN B) U's: U(AN B, 


This is useful in the following theorem. 


Total Probability If B,, B,,..., B, is a collection of mutually exclusive and 
exhaustive events, then for any event A, 
k 
P(A) = ¥) P(B)P(A|B,) (1.5.6) 
isi 
Proof 
The events An B,, AN B,,..., A B, are mutually exclusive, so it follows 
that 
k 
P(A) = ¥) P(A 2 B) (1.5.7) 
i=1 ; 


and the theorem results from applying Theorem 1.5.1 to each term in this sum- 
mation. 


Theorem 1.5.2 sometimes is known as the Law of Total Probability, because it 
corresponds to mutually exclusive ways in which A-can occur relative to a parti- 
tion of the total sample space S. 

Sometimes it is helpful to illustrate this result with a tree diagram. One such 
diagram for the case of three events B,, B,, and B, is given in Figure 1.3. 


Tree diagram showing the Law of Total Probability 


A 
ae 

‘a 

A 
B, at 

x 

A 
B, an 
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The probability associated with branch B; is P(B), and the probability associ- 
ated with each branch labeled A is a conditional probability P(A|B,, which 
may be different depending on which branch, B;, it follows. For A to occur, it 
must occur jointly with one and only one of the events B;. Thus, only An B,, 
Ac B,, or A B; must occur, and the probability of A is the sum of the prob- 
abilities of these joint events, P(B,)P(A | B,). 


Factory 1 in Example 1.5.1 has two shifts, and the microchips from factory 1 can 
be categorized according to which shift produced them. As before, the experiment 
consists of choosing a microchip at random from the box and testing to see 
whether it is defective. Let B, be the event “produced by shift 1” (factory 1), Bz 
the event “produced by shift 2” (factory 1), and B, the event “produced by factory 
2.” As before, let A be the event “obtaining a defective microchip.” The categories 
are given by Table 1.3. 


Numbers of defective and non- 
defective microchips from a 
common lot 


8 8B B, Totals 


4q 2 
A 5 10 5 20 
A’ 20 25 35 80 


Totais 25 35 40 100 


Various probabilities can be computed directly from the table. For example, 
P(B,) = 25/100, P(B,) = 35/100, P(B3;)= 40/100, P(A|B,) = 5/25, P(A|B,) 
= 10/35, and P(A|B;) = 5/40. It is possible to compute P(A) either directly from 
the table, P(A) = 20/100 = 0.20, or by using the Law of Total Probability: 


P(A) = P(B,)P(A|B,) + P(B2)P(A| Bz) + P(B3)P(A| Bs) 
25 \f 5 35 \f10 40 \/ 5 
= (ios) s) * (ros) * (i (a) 
= 0.05 + 0.10 + 0.05 = 0.20 
This problem is illustrated by the tree diagram in Figure 1.4. 


FIGURE 71.4 


FIGURE 1.8 
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Tree diagram for selection of microchips from combined lot 


5/25 
ANB, 

10/35 
ANB, 

5/40 
ANB; 


Example 71.5.4. Consider the following variation on Example 1.5.3. The microchips are sorted 


into three separate boxes. Box 1 contains the 25 microchips from shift 1, box 2 
contains the 35 microchips from shift 2, and box 3 contains the remaining 40 
microchips from factory 2, The new experiment consists of choosing a box at 
random, then selecting a microchip from the box. This experiment is illustrated in 
Figure 1.5. 


Selection of microchips from -three different sources 


| 


5 defective 10 defective 5 defective 
20 good 25 good 35 good 


Box 1 Box 2 Box 3 


In this case, it is not possible to compute P(A) directly from Table 1.3, but it 
still is possible to use equation (1.5.6) by redefining the events B,, B,, and B, to 
be respectively choosing “box 1,” “box 2,” and “box 3.” Thus, the new assignment 
of probability to B,, B,, and B, is P(B,) = P(B,) = P(B3) = 1/3, and 


wea =(3)a5)*(5)G3) * Gan) 


_ 31 . 
~ 280 


As a result of this new experiment, suppose that the component obtained is 
defective, but it is not known which box it came from. It is possible to compute 
the probability that it came from a particular box given that it was defective, 
although a special formula is required. 
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Bayes’ Rule If we assume the conditions of Theorem 1.5.2, then for each j = 1, 
2, veey k, 
P(B)P(A| B;) 


P(B; (AVS (1.5.8) 


= P(B)P(A|B) 
Proof 
From Definition 1.5.1 and Multiplication Theorem 1.5.5 we have 
P(A - B)) = P(B,)P(A|B;) 
P(A) P(A) 
The theorem follows by replacing the denominator with the right side of (1.5.6 


P(B;| A) = 


For the data of Example 1.5.4, the conditional probability that the muchO 
came from box 1, given that it is defective, is 


(1/3)(5/25) 
(1/3)(5/25) + (4/3)(10/35) + (1/3)(5/40) 
56 
a 0.327 
Similarly, P(B,| A) = 80/171 = 0.468-and P(B;|A) = 35/171 = 0.205. 

Notice that these differ from the unconditional probabilities, P(B,) = 1/3 
== 0.333. This reflects the different proportions of defective items in the boxes. In 
other words, because box 2 has.a higher proportion of defectives, choosing a 
defective item effectively increases the likelihood that it was chosen from box 2. 

For another illustration, consider the following example. 


P(B,| A) = 


A man starts at the point O on the map shown in Figure 1.6. He first chooses a 
path at random and follows it to point B,, B,, or B,. From that point, 
he chooses a new path at random and follows it to one of the points A,, i = 1, 
Dewey. 


Map of possible paths 


A, A, A; Ag : As As A; 
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It might be of interest to know the probability that the man arrives at point 
A,. This can be computed from the Law of Total Probability: 


P(As) = P(B,)P(A4| By) + P(B2)P(A4| Bz) + P(B3)P(A4| Bs) 


(19-00) -Qo- 


Suppose the man. arrives at point A,, but it is not. known which route he 
took. The probability that he passed through a particular point, B,, B,, or B3, 
can be computed from Bayes’ Rule. For example, 


(1/3)(1/4) 
(1/3)(1/4) + (1/3)(1/2) + (1/3)(0) 


which agrees with the unconditional probability, P(B,) = 1/3. 

This is an example.of.a very special'situation called “independence,” which we 
will pursue in the next section. However, this does not occur in every case. For 
example, an application of Bayes’ Rule also leads:to P(B,|A,) = 2/3, which does 
not agree with P(B,) = 1/3. Thus, if he arrived at point A,, it is twice as likely 
that he passed through point B, as it is that he passed through B, . Of course, the 
most striking result concerns point B;, because P(B,| 4,4) = 0, while P(B3) = 1/3. 
This refiects the obvious fact that he cannot arrive at point A, by passing 
through point B;. The practical value of conditioning is obvious when consider- 
ing some action such as betting on whether the man ‘passed through point B,. 


1 
P(B,| Ag) = ar 


INDEPENDENT EVENTS 


In some situations, knowledge that an event A has occurred will not affect the 
probability that an event B will occur. In other words, P(B|A) = P(B). We saw 
this happen in Example 1.5.5, because the probability of passing through point 
B, was 1/3 whether the knowledge that the man arrived at point A, was taken 
into account. As a result of the Multiplication Theorem (1.5.5), an equivalent 
formulation of this situation is P(A m B) = P(A)P(B| A) = P(A)P(B). In general, 
when this happens the two events are said to be independent or stochastically 
independent. 


Definition 71.5.2 
Two events A and B are called independent events if 
P(A 7 B) = P(A)P(B) (1.5.9) 


Otherwise, A and B are called dependent events. 


As already noted, an equivalent formulation can be given in terms of condi- 
tional probability. 
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If A and B are events such that P(A) > 0 and P(B) >0, then A and B are inde- 
pendent if and only if either of the following holds: 


P(A|B)= P(A) P(B} A) = P(B) 


We saw examples of both independent and dependent events in Example 1.5.5. 
There was also an example of mutually exclusive events, because P(B;|A,) = 0, 
which implies P(B; ™ A4) = 0. There is often confusion between the concepts of 
independent events and mutually exclusive events. Actually, these are quite differ- 
ent notions, and perhaps this is seen best by comparisons involving condition- 
al probabilities. Specifically, if A and B are mutually exclusive, then 
P(A|B) = P(B| A) = 0, whereas for independent nonnull events the conditional 
probabilities are nonzero as noted by Theorem 1.5.4. In other words, the pro- 
perty of being mutually exclusive involves a very strong form of dependence, 
because, for nonnull events, the occurrence of one event precludes the occurrence 
of the:other event. 

There are many applications in which events are assumed to be independent. 


A “system”.consists of several components that are_ hooked up in some particular 
configuration. It-is often assumed in applications. that the failure of one com- 
ponent does not affect the likelihood that another: component will fail. Thus, the 
failure of one component is assumed to be independent of the failure of another 
component. 

A series system of two components, C, and C,, is illustrated by Figure 1.7. It is 
easy to think of such a system in terms of two electrical components (for example, 
batteries in a flashlight) where current must. pass through both components for 
the system to function. If A, is the event. “C, fails” and A, is the event “C, fails,” 
then the event “the system fails” is A, .U A,. Suppose that P(A,)=01 and 
P(A,) = 0.2. If. we assume that. A, and A, are independent, then the probability 
that the system fails is 


P(A, VU A) = P(A,) + P(A2) — P(A, 27 A)) 
= P(A,) + P(A) ~ P(A,)P(A2) 
= 0.1 + 0.2 — (0.1)(0.2) = 0.28 
The probability that the system works properly is 1 — 0.28 = 0.72. 


Series system of two components 


FIGURE 1.8 
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Notice that the assumption of independence permits us to factor the probabil- 
ity of the joint event, P(A, m A,), into the product of the marginal probabilities, 
P(A,)P(A)). 

Another common example involves the notion of a parallel system, as illus- 
trated in Figure 1.8. For a parallel system to fail, it.is necessary that both com- 


-ponents fail, so the event “the system fails” is:A, ~ A,. The probability that this 


system fails is P(A, M Az) = P(A,)P(A,) = (0.1)(0.2) = 0.02, again assuming the 
components fail independently. 

Note that the probability of failure for a series system is greater than the prob- 
ability of failure of either component, whereas for a parallel system it is less. This 
is because both components must function for a series system to function, and 
consequently the system is more likely to fail than an individual component. On 
the. other.-hand, a parallel system is.a redundant system: One component can fail, 
but the system will continue to function provided the other. component functions. 
Such redundancy is common in aerospace systems, where the failure of the 
system may be catastrophic. 

A common example of dependent events occurs in connection with repeated 
sampling without replacement from a finite collection..In Example 1.5.2 we con- 
sidered the results of drawing two cards in succession from a deck. It turns out 
that the events A, (ace on the first draw) and A, (ace on the second draw) are 
dependent because P(A) = 4/52, while P(A,| A,) = 3/51. 

Suppose instead that the outcome of the first card is recorded and then the 
card is replaced in the deck and the deck is shuffled before the second draw is 
made. This type of sampling is referred to as sampling with replacement, and it 
would be reasonable to assume that the draws are independent trials. In this case 
P(A, 1 Az) = P(A,)P(A)). ; 

There are many other problems in which it is reasonable to assume that 


repeated trials of an experiment are independent, such as tossing a coin or rolling 


a die repeatedly. 
It is possible to show that independence of two events also implies the indepen- 
dence of some related events. 


Parallel system of two components 
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Two events A and B are independent if and only if the following pairs of events 
are also independent: 


1. A and B’. 
2. A’ and B. 
3. A’ and B’. 


Proof 


See Exercise 38. 


It is also possible to extend the notion of independence to more than two 
events. 


Definition 7.5.3 


The k events A,, A,,..., A, are said to be independent or mutually independent if for 
every j = 2, 3,..., k and every subset of distinct indices i,,i,,..., ij, 


P(A; 0 Ay, 011 0 Aj) = P(A, PCA:) “> P(A;) (1.5.10) 


Suppose A, B, and C are three mutually independent events. According to the 
definition of mutually independent events, it is. not sufficient simply to verify 
pairwise independence. It would be necessary to verify P(A m B) = P(A)P(B), 
P(A. A.C) = P(A)P(C), P(B a C)=P(B)P(C),. and also P(AN BC) 
= P(A)P(B)P(C). The following examples show that pairwise independence does 
not imply this last three-way factorization and vice versa. 


A box contains eight tickets, each labeled with a binary number. Two are labeled 
111, two are labeled 100, two 010, and two 001. An experiment consists of 
drawing one ticket at random from the box. Let A be the event “the first digit is 
1,” B the event “the second digit is 1,” and C the event “the third digit is 1.” This 
is illustrated by Figure 1.9. It follows that P(A) = P(B) = P(C) = 4/8 = 1/2 and 
that P(A 4 B) = P(A Nn C)= P(B 2A C) = 2/8 = 1/4; thus A, B, and C€ are pair- 
wise independent. However, they are not mutually independent, because 


“- 


P(AANBAQ= ; = ; # . = P(A)P(B)P(C) 
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FIGURE 1.9 Selection of numbered tickets 


ne 
Example 7.5.8 - In Figure 1.9, let us change the number on one ticket in the first column from 111 
to 110, and the number of one ticket in the second column from 100 to 101. We 


still have 
P(A) = PLB) = PC) = 5 
but 
P(B o=ntvi= pepe 
(Bo OC)=5 47 = PBC) 
and 
PAN Ba 0) === PAPC) 


,____________ In this case we have three-way factorization, but not independence of all pairs. 


1.6 


COUNTING TECHNIQUES 


In many experiments with finite sample spaces, such as games of chance, it may 
be reasonable to assume that all possible outcomes are equally likely. In that 
case, a realistic probability model should result by following the classical 
approach and taking the probability of any event A to be P(A) = n(A)/N, where 
N is the total number of possible outcomes and n(A) is the number of these 
outcomes that correspond to occurrence of the event A. Counting the number of 
ways in which an event may occur can be a tedious problem in complicated 
experiments. A few helpful counting techniques will be discussed. 
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MULTIPLICATION PRINCIPLE 


First note that if one operation can be performed in n, ways and a second oper- 
ation can be performed in n, ways, then there are n, ‘nm, ways in which both 
operations can be carried out. 


Suppose a coin is tossed and then a marble is selected at random from a box 
containing one black (B), one red (R), and one green (G) marble. The possible 
outcomes are HB, HR, HG, TB, TR, and TG. For each of the two possible 
outcomes of the coin there are three marbles that may be selected for a total of 
2+3=6 possible outcomes. The situation also is easily illustrated by a tree 
diagram, as in Figure 1.10. 


Tree diagram of two-stage experiment 


Another application of the multiplication principle was discussed in Example 
1.5.2 in connection with counting the number of ordered two-card hands. 

Note that the multiplication principle can be extended to more than two oper- 
ations. In particular, if the ith of r successive operations can be performed in a, 
ways, then the total number of ways to carry out all r operations is the product 


r 
[im = yng 0, (1.6.1) 
i=1 


One standard type of counting problem is covered by the following theorem. 


If there are N possible outcomes of each of r trials of an experiment, then there 
are N’ possible outcomes in the sample space. A 


How many ways can a 20-question true-false test be answered? The answer 
is 27°. 


Example 1.6.3 


Example 7.6.4 
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How many subsets are there from a set of m elements? In forming a subset, one 
must decide for each element whether to include that element in the subset. Thus 
for each of m elements there are two choices, which give a total of 2” possible 
subsets. This includes the null set, which corresponds to the case of not including 
any element in the subset. 


As suggested earlier, the way an experiment is carried out or the method of 
sampling may affect the sample space and the probability assignment over the 
sample space. In particular, sampling items from a finite population with and 
without replacement are two common schemes. Sampling without replacement 
was illustrated in Example 1.5.2. Sampling with replacement is covered by 
Theorem 1:6.1. 


If five cards are drawn from a deck of 52 cards with replacement, then there are 
(52)° possible hands. If the five cards are drawn without replacement, then the 
more general multiplication principle may be applied to determine that there are 
52 - 51-50-49 - 48 possible hands. In the first case, the same card may occur 
more than once in the same hand. In the second case, however, a card may. not 
be repeated. 


Note that in both cases in the above example, order is considered important. 
That is, two five-card hands may eventually end up with the same five cards, but 


~ they are counted ‘as different hands in the example if the cards were obtained in a 


different order. For example, let all five cards be spades. The outcome (ace, king, 
queen, jack, ten) is different from the outcome (king, ace, queen, jack, ten). If 
order had not been considered important, both of these outcomes would be con- 
sidered the same; indeed, there would be several different ordered outcomes cor- 
responding to this same (unordered) outcome. On the other hand, only one 
outcome corresponds to all five cards being the ace of spades (in the sampling- 
with-replacement case), whether the cards are ordered or unordered. 

This introduces the concept of distinguishable and indistinguishable elements. 
Even though order may be important, a new result or arrangement will not be 
obtained if two indistinguishable elements are interchanged. Thus, fewer ordered 
arrangements are possible if some of the items are indistinguishable. We also 
noted earlier that there are fewer distinct results if order is not taken into 
account, but the probability of any one of these unordered results occurring then 
would be greater. Note also. that it is common practice to assume that order is 
not important when drawing without replacement, unless otherwise specified, 
although we did consider order important in Example 1.6.4. 
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PERMUTATIONS AND COMBINATIONS 


Some. particular formulas that are helpful in counting the number of possible 
arrangements for.some of the cases mentioned will be given. An ordered arrange- 
ment of a set of objects is known as.a permutation. 


The number of permutations of n distinguishable objects is n!. 


Proof 


This follows by applying the multiplication principle. To fill n positions with n 
distinct. objects, the first position may be filled n ways using any one of the n 
objects, the second: position may be filled n — 1. ways using any of the remaining 
n — 1 objects, and so on until the last object is placed in the last position. Thus, 
by the multiplication principle, this operation may be carried out in 
n:(n—-J)----+ 1=n! ways. 


For example, the number. of arrangements of five distinct.cards is 5! = 120. 
One also may be interested in the number of ways of selecting r objects from n 
distinct objects and then ordering these 7 objects. 


The number of permutations of n distinct objects taken r at a time is 


n! 
iP, = (n a 7! (1.6.2) 


Proof 


To fill r-positions from n. objects, the first. position may be filled in n ways using 
any one of the n objects, the second position: may be.filled.in‘n — 1 ways, and so 
on until n— (r—.1). objects are left to fill in the rth position. Thus, the total 
number of ways of carrying out this.operation is 


ni(n—1)-(N—-2)-0:-M—(r-)D)= 


n! 
(a— 7)! 


The number of permutations of the four letters a, b, c, d taken two at a time is 
41/2! = 12.’ These are displayed in Figure 1.11. In picking two out of the four 
letters, there are six unordered ways to choose two letters from the four, as given 
by the top row. Each combination of two letters then can be permuted 2! ways to 
get the total of 12 ordered arrangements. 


Permutations of four objects taken two at a time 


ab ac ad bc bd cd 
ba ca da cb db dc 


———— 
Example 7.6.6 
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A box contains n tickets, each marked with a different integer, 1, 2, 3, ..., n. If 
three tickets are selected at random without replacement, what is the probability 
of obtaining tickets with consecutive integers? One possible solution would be to 
let the sample space consist of all ordered triples (i, j, k), where i, j, and k are 
different integers in the range 1 to n. The number of such triples is ,P, 
=n!/(n — 3)! = n(n — 1)(n — 2). The triples that consist of consecutive integers 
would be (1, 2, 3), (2, 3, 4), ..., (2-2, n — 1, n) or any of the triples formed by 
permuting the entries in these. There would be 3! - (n —2)=6- (n — 2) such 
triples. The desired probability is 


6:(n—2) 6-(@—-2) 6 
Ps wn—Al(n—2) n(n — 1) 


If the order of the objects is not important, then one may simply be interested 
in the number of combinations that are possible when selecting r objects from n 


distinct objects. The symbol (") usually is used to denote this number. 


The number of combinations of n distinct objects chosen r at a time is 
n\n , 
Ar) inn! (8s) 


Proof 


As suggested in the preceding example, ,P, may be interpreted as the number of 


_ ways of choosing r objects from n objects and then permuting the r objects 7! 


ways, giving 


n n! 


Dividing by r! gives the desired expression for (") 
r 


Thus, the number of combinations of four letters taken two at a time is 


4 4! . 
(°) = 31 6, as noted above. If order is considered, then the number of 


4 
arrangements becomes 6: 2! = 12 as before. Thus, ( ) counts the number of 


paired symbols in either the first or second row, but not both, in Figure 1.11. 
It also is possible to solve the probability problem in Example 1.6.6 using 
combinations. The sample space would consist of all combinations of the n inte- 
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gers 1, 2,...,n taken three at a time. Equivalently, this would be the collection of 
all subsets of size 3 from the set {1, 2, 3, ..., m}, of which there are 

ne n!} n(n 1Y(n — 2) 

3) 3n—3)! 6 


The n—2 combinations or subsets of consecutive integers would be {1, 2, 3}, 
{2, 3, 4},..., {n — 2, — 1, n}. As usual, no distinction should be made of subsets 
that list the elements in a different order. The resulting probability is 


(n — 2) __ 6 
[n(n — 1)(n — 2/6] n(n — 1) 
as before. 

This shows that some problems.can be. solved using either combinations or 
permutations. Usually, if there is a choice, the combination approach is simpler 
because the sample space is smaller. However, combinations are not appropriate 
in some problems. 


In Example 1.6.6, suppose that the sampling is done with replacement. Now, the 
same number can be repeated in the triples (i, j, k), so that the sample space has 
n? outcomes. There are still only 6(n — 2) triples of consecutive integers, because 
repeated integers cannot be consecutive. The probability of consecutive integers 
in this case is 6(n — 2)/n>. Integers can be repeated in this case, so the com- 
bination approach is not appropriate. 


A familiar use..of the. combination.notation is.in.expressing the binomial 
expansion 


@+pyr=> (jase (1.6.4) 


k=0 


n 
thi 
In this case, ( k 


of choosing k of the n factors (a + b) from which to use the a term, with the b 
term being used from the remaining n — k factors. 


) is the coefficient of a*b”~*, and it represents the number of ways 


The combination concept can be used to determine the number of subsets of a set 


m ath 
of m elements. There are (") ways of choosing j elements from the m elements, 
m 
so there are ( " subsets of j elements for j = 0, 1, ..., m. The case j = 0 corre- 


sponds to the null set and is represented by (") = 1, because 0! is defined to be 


Example 1.6.70 


-—————__— 
Example 1.6.77 


FIGURE 1.72 
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equal to 1, for notational convenience. Thus the total number of subsets includ- 
ing the null set is given by : 


s & =(1 +1)" = 2" (1.65) 


j=o \J 


If five cards are drawn from a deck of cards without replacement, the number of 
five-card hands is 


32\ .. 52! 

5] 5147! 
If order is taken into account as in Example 1.6.4, then the number of ordered 
five-card hands is 


52 52! 
s2Ps -(S)=3 


Similarly, in Example 1.5.2 the number of ordered two-card hands was given to 


be 
52 
~2t=52-51 
(3) 


INDISTINGUISHABLE OBJECTS 


The discussion to this point has dealt with arrangements of n distinguishable 
objects. There are also many applications involving objects that are not all distin- 


‘guishable. 


You have five marbles, two black and three white, but otherwise indistinguish- 
able. In Figure 1.12, we represent all the distinguishable arrangements of two 
black (B) and three white (W) marbles. 


Distinguishable arrangements of five objects, two of one type and three of another 


BBWWW BWBWW WBBWW. BWW BW WBW BW 


' WW BBW WWBWB WWW BB WBWW'B BWWWB 


Notice that arrangements are distinguishable if they differ by exchanging 
marbles of different colors, but not if the exchange involves the same color. We 
will refer to these 10 different arrangements as permutations of the five objects 
even though the objects are not all distinguishable. 
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A more general way to count such permutations first would be to introduce 
labels for the objects, say B, B, W,W,W,. There are 5! permutations of these 
distinguishable objects, but within each color there are permutations that we 
don’t want to count. We can compensate by dividing by the number of permu- 
tations of black objects (2!) and of white objects (3!). Thus, the number of permu- 
tations of nondistinguishable objects is 

5! 


aii 10 


This is a special case of the following theorem. 


The number of distinguishable permutations of n.objects.of which r ate of one 
kind and n — r are of another kind is 


n\n! 
r}~ rn—n)! | a8.@) 


Clearly, this concept can be generalized to the case of permuting k types of 
objects. 


The number of permutations of n objects:of which r, are of one kind, r, of a 
second kind, ..., 7, of a kth kind is 


n! 
rylrgh---r,! 


(1.6.7) 


Proof 


This follows from the argument of Example 1.6.11, except with k different colors 
of balls. 


You have 10 marbles—two black, three white, and five red, but otherwise not 
distinguishable. The number of different permutations is 


10! 


2131517 270 


The notion of permutations of n objects, not all of which are distinguishable, is 
related to yet another type of operation with n distinct objects. 


Theorem 1.6.7 


‘ 
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PARTITIONING 


Let us select r objects from n distinct objects and place them in a box or “cell,” 
dacd ; : n 

and then place the remaining n — r objects in a second cell. Clearly, there are ( ) 
, 


ways of doing this (because permuting the objects within a cell will not produce a 
new result), and this is referred to as the number of ways of partitioning n objects 
into two cells with r objects in one cell and n —,r in the other. The concept 
generalizes readily to partitioning n distinct objects into more than two cells. 


The number of ways of partitioning a set of n objects into k cells with r, objects 
in the first cell, r. in the second cell, and so forth is 
n!} 
rylrg! sory! 
k 


where ))7; =n. 
i=1 


Note that partitioning assumes that the number of objects to be placed in each 
cell is fixed, and that the order in which the objects are placed into cells is not 
considered. 

By successively selecting the objects, the. numberof. partitions also may be 
expressed as 


n\(n—ry, NT met TY ni 
ry r2 Ny rylrg!-:-r,! 


How many ways can you distribute 12 different popsicles equally among four 
children?:By Theorem 1.6.7 this is 
12! 
31913131 20000 
This is also the number of ways of arranging 12 popsicles, of which three are red, 
three are green, three are orange, and three are yellow, if popsicles of the same 
color are otherwise indistinguishable. 


PROBABILITY COMPUTATIONS 


As mentioned earlier, if it can be assumed that all possible outcomes are equally 
likely to occur, then the classical probability concept is useful for assigning prob- 
abilities to events, and the counting techniques reviewed in this section may be 
helpful in computing the number of ways an event may occur. 
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Recall that the method of sampling, and. assumptions. concerning order, 
whether the items are indistinguishable, and so on, may have an effect on the 
number of possible outcomes. 


A student answers 20 true—false questions at random. The probability of getting 
100% on the test is P(100%) = 1/2?° = 0.00000095. We wish to know the prob- 
ability of getting 80% right, that is, answering 16 questions correctly. We do not 


: : 20 
care which 16 questions are answered correctly, so there are ( I a) ways of choos- 


20 
ing exactly 16 correct answers, and P(80%) = ( i 3) | 27°. = 0.0046. 


Sampling Without Replacement A box contains 10 black marbles and 20 white 
marbles, and five marbles are selected without replacement. The probability of 
getting exactly two black marbles is 


(2102) 
RANE / = 0.360 (1.6.8) 
& 
a ae) 


30 . 10 : 
There are ( s) total possible outcomes. Also there are ( 4 ways of choosing 


P(exactly 2 black) = 


20 
the two black marbles from the 10 black marbles, and ( 3 ) ways of choosing the 
remaining three white marbles from the 20 white marbles. By the multiplication 
10\/20 : 
principle, there are ( aE ®) ways of achieving the event of getting two black 


marbles. Note that order was not considered important in this problem, although 
all 30 marbles are considered distinct in this computation, both in considering 
the total number of outcomes in the sample space and in considering how many 
outcomes correspond to the desired event occurring. Even though the question 
does not distinguish between the order of outcomes, it is possible to consider the 
question relative to the larger sample space of equally likely ordered outcomes. 


In that case one would have 3)P, = ie ) - 5! possible outcomes and 


2) 


which gives the same answer as before. 


P(exactly 2 black) = 
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It also is possible to attack this problem by the conditional probability 
approach discussed in Section 1.5. First consider the probability of getting the 
outcome BBWWW in the specified order. Here we choose to use the distinction 
between B and W but not the distinction within the B’s or within the W’s. By the 
conditional probability approach, this joint probability may be expressed as 


Similarly, 
1020.9 191 
P(BWBWW) = — — cae 


and so on. Thus, each particular ordering has the same probability. If we do not 
wish to distinguish between the ordering of the black and white marbles, then 


eee ee 


Pfexactly 2. black) = ( 30 29 28 27 26 


_— : . 5 . 
which again is the same as equation (1.6.8). That is, there are ( ;) = 10 different 
particular orderings that have two black and three white marbles (sce Figure 1.12). 

: 5 : dis 
One could consider ( 5) as the number of ways of choosing two positions out of 


the five positions in which to place two black marbles. If a particular order is not 
required, the probability of a successful outcome is greater. 
We could continue to consider all 30 marbies distinct in this framework, but 
because only the order between black and white was considered in computing a 


: : 5 
particular sequence, it follows that there are only ( 7 unordered sequences rather 


than 5! sequences. Thus, although two black marbles may be distinct, permuting 
them does not produce a different result: The order of the black marbles within 
themselves was not considered important when defining the ordered sequences; 


‘ ; ; 5 
only the order between black and white was considered. Thus the coefficient a 


could also be interpreted as the number of permutations of five things of which 
two were alike and three were alike (see Figure 1.12). 

Thus, we have seen that it is possible to think of the black and white marbles 
as being indistinguishable within themselves in this problem, and the same value 
for P(exactly 2 black) is obtained; however, the computation is no longer carried 
out over an original basic sample space of equally likely outcomes. For example, 
on the first draw one would just have the two possible outcomes, B and W, 
although these two outcomes obviously would not be equally likely, but rather 
P(B) = 10/30 and P(W) = 20/30. Indeed, the assumption that the black marbles 
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and white marbles are indistinguishable within themselves appears more natural 
in the conditional probability approach. Nevertheless, the distinctness assump- 
tion is a convenient aid in the first approach to obtain the more basic equally 
likely sample space, even though the question itself does not require dis- 
tinguishing within a color. 


Sampling with Replacement If the five marbles are drawn with replacement in 
Example 1.6.15, then the conditional probability approach seems most natural 
and analogous to (1.6.10), 


P(exactly 2 black) = (; 1E3) (3) (1.6.11) 


Of course, in this case the outcomes on each draw are independent. 
If one chooses to use the classical approach in this case, it is more convenient 
to consider the sample space of 30° equally likely ordered outcomes; in Example 


: 30 
1.6.15 it is more convenient just to consider the sample space of ( *) unordered 


outcomes as in equation (1.6.8), rather than the ordered outcomes as in equation 
(1.6.9). For event A, one then has “exactly 2 black,” 


5 2293 
na (3 )s0%20 
305 


The form in ae case remains quite similar to equation (1.6.11), although the 


P(A) = 


5 : : 
argument would be somewhat different. There are ( :) different patterns in which 


the ordered arrangements may contain two black and three white marbles, and 
for each pattern there are 10730° distinct arrangements that can be formed in this 
sample space. 


Because many diverse types of probability problems can be stated, a unique 
approach often may be needed to identify the mutually exclusive ways that an 
event can occur in such a manner that these ways can be readily counted. 
However, certain classical problems (such as those illustrated in Examples 1.6.15 
and 1.6.16) can be recognized easily and genera! probability distribution functions 
can be determined for them. For these problems, the individual counting prob- 
lems need not be analyzed so carefully each time. 


SUMMARY 


The purpose of this chapter was to develop the concept of probability in order to 
model phenomena where the observed result is uncertain before experimentation. 
The basic approach involves defining the sample space as the set of all possible 
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outcomes of the experiment, and defining an event mathematically as the set of 
outcomes associated with occurrence of the event. The primary motivation for 
assigning probability to an event involves the long-term relative frequency inter- 
pretation. However, the approach of defining probability in terms of a simple set 
of axioms is more general, and it allows the possibility of other methods of 
assignment and other interpretations of probability. This approach also makes it 
possible to derive general properties of probability. 

The notion of conditional probability allows. the introduction of additional 
information concerning the occurrence of one event when assigning probability 
to another. If the probability assigned to one event is not affected by the informa- 
tion that another event has occurred, then the events are considered independent. 
Care should be taken not to confuse the concepts of independent and mutually 
exclusive events. Specifically, mutually exclusive events are dependent,. because 
the occurrence of one precludes the occurrence of the other. In other words, the 
conditional probability of one given the other is zero. 

One of the primary methods of assigning probability, which applies in the case 
of a finite sample space, is based on the assumption that all outcomes are equally 
likely to occur. To implement this method, it is useful to have techniques for 
counting the number of outcomes in an event. The primary techniques include 
formulas for counting ordered arrangements of objects pamelor) and 
unordered sets of objects (combinations). 

To express probability models by general formulas, itis. convenient first to 
introduce. the concept.of.a. “random variable” and a function that describes the 
probability distribution. These concepts will be discussed in the next chapter, and 
general solutions then can be provided for some of the basic counting problems 
most often encountered. 


EXERCISES 


A gum-ball machine gives out a red, a black, or a green gum ball. 
(a) Describe an appropriate sample space. 
(b) List all possible events. 
(c) If R is the event “red,” then list the outcomes in R’. 
(d) If Gis the event “green,” then what is R 4 G? 


Two gum balls are obtained from the machine in Exercise 1 from two trials, The order of 
the outcomes is important. Assume that at least two balls of each color are in the machine. 
(a) What is an appropriate sample space? 
(b) How many total possible events are there that contain eight outcomes? 
(c) Express the following events as unions of elementary events. C, = getting a red ball 
on the first trial, C, = getting at least one red ball, C, A Cz, C0 Cy. 


There are four basic blood groups: O, A, B, and AB. Ordinarily, anyone can receive the 
blood of a donor from their own group. Also, anyone Can receive the blood of a donor 
from the O group, and any of the four types can be used by a recipient from the AB group. 
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All other possibilities are undesirable. An experiment consists of drawing a pint of blood 
and determining its type for each of the next two donors who enter a blood bank. 
(a) List the possible (ordered) outcomes of this experiment. 
(b) List the outcomes corresponding to the event that the second donor can receive the 
blood of the first donor, 
(c). List the outcomes corresponding to.the event that each donor can receive the blood 
of the other, 


An experiment consists of drawing gum balls from a gum-ball machine until a red ball is 
obtained. Describe a sample space for this experiment. 


The number of alpha particles emitted by:a radioactive sample in.a fixed time interval is 
counted, 


(a) Give a sample space for.this experiment. 
(b) The elapsed time is measured until the:first alpha particle is emitted. Give a sample 
space for this experiment. 


An experiment is conducted to determine what fraction of a piece of metal is gold. Give a 
sample space for this experiment. 


A randomly selected car battery.is tested and the time of failure.is recorded. Give an 
appropriate sample.space for this experiment. 


We obtain 100 guin balls from a machine, and we get 20 red (R), 30 black (B), and 50 green 
(G) gum balls. 
(a) Can we use, as a probability model for the color of a gum ball from the machine, 
one given by p; = P(R) = 0.2, p; = P(B) = 0.3, and p, = P(G) = 0.5? 
(b) Suppose we later notice that some yellow (Y) gum balls are also in the machine. 
Could we use as a model p, = 0.2, p, = 0.3, p, = 0.5, and p, = P(Y) = 0.1? 


In Exercise 2, suppose that each of the nine possible outcomes in the sample space is 
equally likely to occur. Compute each of the following: 


(a) P(both red). 
(b). P(C,). 
(c) P(C,). 
(d) P(C, 7 C)). 
(ce) P(C, 0 C)). 
(f) P(C, v Cy). 


Consider Exercise 3. Suppose, fora particular racial group, the four blood types2 are 
equally likely to occur. 
(a) Compute the probability that the second donor can receive blood from the first 
donor. 
(b) Compute the probability that each donor can receive blood from the other. 
(c) Compute the probability that neither can receive blood from the other. 


Prove that P(@) = 0. Hint: Let A; = @ for all i in equation (1.3.3). 
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Prove equation (1.3.5). Hint: Let A; = @ for all i> kin equation (1.3.3). 


When an experiment is performed, one and only one of the events A,, A,, or A; will 
occur. Find P(A,), P(A), and P(A) under each of the following assumptions: 


(a) P(A;) = P(A2) = P(A3). 

(b). P(A,) = P(A.) and P(A3) = 1/2. 

(c) P(A,) = 2P(A2) = 3P(A3). 
A balanced coin is tossed four times. List the possible outcomes and compute the 
probability of each of the following events: 

{a) exactly three heads. 

(b)..at least one head. 

(c) the number of heads equals the number of tails. 

(d) the number of heads exceeds the number of tails. 
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Two part-time teachers are hired by the mathematics department and each is assigned at 
random to teach a single course, in trigonometry, algebra, or calculus. List the outcomes 
in the sample space and find the probability that they will teach different courses. Assume 


that more than one section of each Course is offered. 


Prove Theorem 1.4.4. Hint: Write AU BUC =(A vu B) UC and apply Theorem 1.4.3. 


Prove Theorem 1.4.5. Hint: If.A.-B, then we.can write B= AU (B 2m A’), a disjoint 
union. : 


if A and B are events, show that: 
(a) P(A + B)) = P(A) — P(A c+ B). 
(b) P(A U B) =1 — P(A’ FB’). 


Let P(A) = P(B) = 1/3 and P(A 1 B) = 1/10. Find the following: 

(a) P(B’). 

(b) .P(A.u. B’). 

(c) P(B.n A). 

(d) P(A’ U B’). 
Let P(A) = 1/2, P(B) = 1/8, and P(C) = 1/4, where A, B, and C are mutually exclusive. 
Find the following: 

(a) (PAU BU C). 

(b) P(Al. Bn C’). 


The event that exactly one of the events Aor B occurs can be represented as 
(A 21 B’/) vu (A' 29 B). Show that 


P[(A 7B’) u (A's BY] = P(A) + P(B) — 2P(A 29 B) 


A track star runs two races on a certain day. The probability that he wins the first race is 


0.7, the probability that he wins the second race is 0.6, and the probability that he wins 
both races is 0.5. Find the probability that: 


(a) he wins at least one race. 
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(b) he wins:exactly one race. 
(c) he wins neither race. 


A certain family owns two television sets, one color and one black-and-white set. Let A be 
the event the color set is on and B the event the black-and-white set is on. If P(A) = 0.4, 
P(B) = 0.3, and P(A U B) = 0.5, find the probability of each event: 

(a) both are on. 

(b) the color set is on and the other is off. 

(c) exactly one set is on. 

(d) neither set is on. 


Suppose P(A;) = 1/(3 + i) for i = 1, 2, 3, 4. Find an upper bound for 
P(A, U A, U Ay U Ay). 


A box contains three good cards and two bad (penalty) cards. Player A chooses a card and 
then player B chooses a card. Compute the following probabilities: 

(a). P(A good). 

(b) P(B good] A good). 

(c) .P(B goed|A bad). 

(d) P(B good q A good) using (1.5.5). 

(ce). Write out the sample space of ordered pairs and compute P(B good m A good) and 
P(B good| A good) directly from definitions. (Note: Assume that the cards are 
distinct.) 

(f) P(B good). 

(g) P(A good | B good). 


Repeat Exercise 25, but-assume that player A looks at his card, replaces it in the box, and 
remixes the cards before player B draws. 


A bag contains five blue balls and three red balls. A boy draws a ball, and then draws 
another without replacement. Compute the following probabilities: 

(a) P(2 blue balls). 

(b) P(1 blue and 1 red). 

(c) P(at least 1 blue). 

(d) P(2 red balis). 


In Exercise 27, suppose a third ball is drawn without replacement. Find: 
(a). P(no red balls left after third draw). 
(b) P(1 red ball left). ‘ 
(c) P(first red ball on last draw). 
(d) P(a red ball on last draw). 


A family has two children. It is known that at least one is a boy. What is the probability 
that the family has two boys, given at least:one boy? Assume P(boy) = 1/2. 
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Two cards are drawn from a deck of cards without replacement. 
(a). What is the probability that the second card is a heart, given that the first card is a 
heart? 
(b) What is the probability that both cards are hearts, given that at least one is a heart? 


A box contains five green balls, three black balls, and seven red balls. Two balls are 
selected at random without replacement from the box. What is the probability that: 
(a) both balls are red? 
(b) both balls are the same color? 


A softball team has three pitchers, A, B, and C, with winning percentages of 0.4, 0.6, and 
0.8, respectively. These pitchers pitch with frequency 2, 3, and 5 out of every 10 games, 
respectively. In other words, for a randomly selected game, P(A) = 0.2, P(B) = 0.3, and 
P(C) = 0.5. Find: 

(a) P(team wins game) = P(W). 

(b) P(A pitched game| team won) = P(A] W). 


One card is selected from a deck of 52 cards and placed in a second deck. A card then is 
selected from the second deck. 
(a) What is the probability the second card is an ace? 
(b) If the first card is placed into a deck of 54 cards containing two jokers, then what is 
the probability that a card.drawn from the second deck is an ace? 
(c) Given that an ace was drawn from the second deck in (b), what is the conditional 
probability that an ace was transferred? 


A pocket contains three coins, one of which had:a head‘on both sides, while the other two 
coins are normal. A coin:is chosen at random from the pocket and tossed three times. 
(a) Find the probability of obtaining three heads. 
(b) Ifa head turns up all three times, what is the probability that this is the two-headed 
coin? 


In a bolt factory, machines 1, 2, and 3 respectively produce 20%, 30%, and 50% of the 
total output. Of their respective outputs, 5%, 3%, and 2% are defective. A bolt is selected 
at random. 

(a) What is the probability that it is defective? 

(b) Given that it is defective, what is the probability that it was made by machine 1? 


Drawer A contains five pennies and three dimes, while drawer B contains three pennies 
and seven dimes. A drawer is selected at random, and a coin is selected at random from 
that drawer. 

(a) Find the probability of selecting a dime. 

(b) Suppose a dime is obtained. What is the probability that it came from drawer B? 


Let P(A) = 0.4 and P(A U B) = 0.6. 
(a) For what value of P(B) are A and B mutually exclusive? 
(b) For what value of P(B) are A and B independent? 
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38. Prove Theorem 1.5.5. Hint: Use Exercise 18. 


29, Three independent components are hooked in series. Each component fails with 
probability p. What is the prebability that the system does not fail? 


40. Three independent components are hooked in parallel. Each component fails with 
probability p. What is the probability that the system does not fail? 


47. Consider the following system with assigned probabilities of malfunction for the five 
components. Assume that malfunctions occur independently. 


What is the probability the system does not malfunction? 


42, The probability that a marksman hits a target is.0.9-on any given shot, and repeated shots 
are independent. He has two pistols; one contains two bullets and the other contains only 
one bullet. He selects:a pistol at random and shoots at the target until the pistol is empty. 
What is the probability of hitting the:target exactly one time? 


& 


Rework Exercise 27, assuming that the balls are chosen with replacement. 


In a marble game a shooter may (A) miss, (B) hit one marble out and stick in the ring, or 
(C) hit one marble out and leave the ring. If B occurs, the shooter shoots again. 

(a) If P(A) = p;, P(B) = p,, and P(C) = pz, and these probabilities do not change from 
shot to shot, then express the probability of getting out exactly three marbles on one 
turn. 

(b) What is the probability of getting out exactly x marbles in one turn? 

(c) Show that the probability of getting one marble is greater than the probability of 
getting zero marbles if 


S 


1—p, 
< 
Py Fp 
45. In the marble game in Exercise 44, suppose the probabilities depend on the number of 
marbles left in the ring, N. Let 


_ 08N 


1 
P(4)=——— —- P(B) rane 


=P 
N+1 N+1 ©) 


Rework Exercise 44 under this assumption. 
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A, B, and C are events such that P(A) = 1/3, P(B) = 1/4, and P(C) = 1/5. Find 
P(A U Bu C) under each of the following assumptions: 

(a) If A, B, and C are mutually exclusive. 

(b) If A, B, and C are independent. 


A bowl contains four lottery tickets with the numbers 111, 221, 212, and 122. One ticket is 
drawn at random from the bowl, and 4; is the event “2 in the ith place”; i = 1, 2, 3. 
Determine whether A,, A,, and A, are independent. 


Code words are formed from the letters A through Z. 
(a) How many 26-letter words can be formed without repeating any letters? 
(b) How many five-letter words can be formed without repeating any letters? 
(c) How many five-letter words can be formed if letters can be repeated? 


License plate numbers consist of two letters followed by a four-digit number, such as 
SB7904 or AY1637. 

(a) How many different plates are possible if letters and digits can be repeated? 

(b) Answer (a) if letters can be repeated but digits cannot. 

(c) How many of the plates in (b) have.a four-digit number that is greater than 5500? 


In how many ways can three boys and three girls sit in a row if boys and girls must 
alternate? 


How many odd three-digit numbers can be formed from the digits 0, 1, 2, 3, 4 if digits can 
be repeated, but the first digit cannot be zero? 


Suppose that from 10 distinct objects, four are chosen at random with replacement. 
(a) What is the probability that no object is chosen more than once? 
(b) What is the probability that at least one object is chosen more than once? 


- A restaurant advertises 256 types of nachos. How many topping ingredients must be 


available to meet this claim if plain corn chips-count as one type? 


A club consists of 17 men and 13 women, and a committee of five members must be 
chosen. 

(a) How many committees are possible? 

(b) How many committees are possible with three men and two women? 

(c) Answer (b) if a particular man must be included. 


A football coach has 49 players available for duty on a special kick-receiving team. 
(a) If 11 must be chosen to play on this special team, how many different teams are 
possible? , 
(b) If the 49 include 24 offensive and 25 defensive players, what is the probability that a 
randomly selected team has five offensive and six defensive players? 


For positive integers n > r, show the following: 


© Oele7 
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—1 —41 
0 O22) 
§7. Provide solutions for the following sums: 
4 4 4 
© ()+Q)+Q) 
6 6 6 6 
© |)+G)+@)+() 
(o) y ea Hint: Use Exercise 56(b). 
i=0 


58. Seven people show up to apply for jobs as cashiers at a discount store. 

(a) If only three jobs are available, in how many ways can three be selected from the 
seven applicants? 

(b) Suppose there are three male and four female applicants, and all seven are equally 
qualified, so the three jobs are filled at random. What is the probability that the 
three hired are all of the same sex? 

(c).In how many different ways could the seven applicants be lined up while waiting for 
an interview? 

(d) If there are four females and three males, in how many ways can the applicants be 
lined up if the first three are female? 


59... The club in Exercise 54 must elect three officers: president, vice-president, and secretary. 
How many different ways can this turn out? 

60. _How many ways can 10 students be lined up to get on a bus if a particular pair of students 
refuse to follow each other in line? 


67. Each student ina class of size n was born in a year with 365 days, and each reports his or 
her birth date (month and day, but not year). 
(a) How many ways can this happen? 
(b) How many ways can this happen with no repeated birth dates? 
(c) What is the probability of no matching birth dates? 
(d) Ina class of 23 students, what is the probability of at least one repeated birth date? 


62. A kindergarten student has 12 crayons. 
(a) How many ways can three blue, four red, and five green crayons be arranged ina 
row? : 
(b) How many ways can 12 distinct crayons be placed in three boxes containing 3, 4, 
and_5 crayons, respectively? 


63. How many ways can you partition 26 letters into three boxes containing 9, 11, and 6 
letters? : 


64. How many ways can you permute 9 a’s, 11 b’s, and 6 c’s? 
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A contest consists of finding all of the code words that can be formed from the letters in 
the name “ATARI.” Assume that the letter A can be used twice, but the others at most 
once, 

(a) How many five-letter words can be formed? 

(b) How many two-letter words can be formed? 

(c). How many words can be formed? 


Three buses are available to transport 60 students on a field trip. The buses seat 15, 20, 
and 25 passengers, respectively. How many different ways can the students. be loaded on 
the buses? 


A certain machine has nine switches mounted in a row. Each switch has three positions, a, 
b, and c. 

(a) How many different settings are possible? 

(b) Answer (a) if each position is used three times. 


Suppose 14 students have tickets for a concert. 

(a) Three students (Bob, Jim, and Tom) own cars and will provide transportation to the 
concert. Bob’s car has room for three passengers (nondrivers), while the cars owned 
by Jim and Tom each has room for four passengers. In how many different ways can 
the 11 passengers be loaded into the cars? 

(b) At the concert hall the students are seated together in a row, If they take their seats 
in random order, find the probability that the three students who drove their cars 
have adjoining seats. 


Suppose the winning number in a lottery is a four-digit number determined by drawing 
four slips of paper (without replacement) from a box that contains nine slips numbered 
consecutively 1 through 9 and then recording the digits in order from smallest to largest. 
(a) How many different lottery numbers are possible? 
(b) Find the probability that the winning number has only odd digits. 
(c) How many different lottery numbers are possible if the digits are recorded in the 
order they were drawn? 


Consider four dice A, B, C, and D numbered as follows: A has 4 on four faces and 0 on 
two faces; B has 3 on all six faces; C has 2 on four faces and 6 on two faces; and D has 5 
on three faces and 1 on the other three faces. Suppose the statement A > B means that the 
face showing on A is greater than on B, and so forth. Show that P[A > B] = P[B > C] 
=.P[C > D] = P[D > A] = 2/3. In other words, if an opponent chooses a die, you can 
always select one that will defeat him with probability 2/3. 


A laboratory test for steroid use in professional athletes has detection rates given in the 
following table: 


Test Result 


Steroid Use + - 


Yes .90 10 
No .01 99 
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If the rate of steroid use among professional athletes is 1.in 50: 
(a) What is the probability that a professional athlete chosen at random will have a 
negative test result for steroid use? 
(b) If the athlete tests positive, what is the probability that he has actually been using 
steroids? 


A box contains four disks that have different colors on each side. Disk 1 is red and green, 
disk 2 is red-and white, disk 3 is red and black, and disk 4 is green and white. One disk is 
selected at random from the box. Define events as follows:.A =.one side is red, B = one 
side is green, C = one side is white, and D = one side is black. 

(a) Are A and B independent events? Why or why not? 

(b) Are B and C independent events? Why or why not? 

(c) Are any pairs of events mutually exclusive? Which ones? 


RANDOM VARIABLES 
AND THEIR 
DISTRIBUTIONS 


2.1 


INTRODUCTION 


Our purpose is to develop mathematical models for describing the probabilities 
of outcomes or events occurring in a sample space. Because mathematical equa- 
tions are expressed in terms of numerical values rather. than as heads, colors, or 
other properties, it is convenient to define a function, known as a random vari- 
able, that associates each outcome in the experiment with a real number. We then 
can express the probability model for the experiment in terms of this associated 
random variable. Of course, in many experiments the results of interest already 
are numerical quantities, and in that case the natural function to use as the 
random variable would be the identity function. 


Definition 2.7.7 


Random Variable A random variable, say X, is a function defined over a sample 


space, S, that associates a real number, X(e) = x, with each possible outcome ¢ in S, 
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Example 2.7.7 


FIGURE 2.7 
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Capital letters, such as X, Y, and Z will be used to denote random variables. 
The lower case letters x, y, z, ... will be used to denote possible values that the 


corresponding random variables can attain. For mathematical reasons, it will be. 


necessary to restrict the types of functions that are considered to be random 
variables. We will discuss this point after the following example. 


A four-sided (tetrahedral) die has a different number—1, 2, 3, or 4—affixed to 
each side. On any given roll, each of the four numbers is equally likely to occur. 
A game consists of rolling the die twice, and the score is the maximum of the two 
numbers that occur. Although the score cannot be predicted, we can determine 
the set of possible values and define a random variable. In particular, if e = (i, j), 
where i, j € {1, 2, 3, 4}, then X(e) = max (i, j). The sample space, S, and X are 
illustrated in Figure 2.1. 


Sample space for two rolls of a four-sided die 


(4) (2.4) G4) 4,4) 


(4,3) 


(4,2) 


(4,3) 


x. 


Each of the events B,, B,, B,, and B, of S contains the pairs (i, j) that have a 
common maximum. In other words, X has value x = 1 over B,, x = 2 over B, 
x = 3 over Bz, and x = 4 over By. 

Other random variables also could be considered. For example, the random 
variable Y(e) = i + j represents the total on the two rolls. 


The concept of a random variable permits us to associate with any sample 
space, S, a sample space that is a set of real numbers, and in which the events of 
interest are subsets of real numbers. If such a real-valued event is denoted by A, 
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then we would want the associated set 
B= {eleeS and X(e) € A} (2.1.1) 


to be an event in the underlying sample space S. Even though A and B are 
subsets of different spaces, they usually are referred to as equivalent events, and 
we write 


P[X € A] = P(B) ; (2.1.2) 


The notation P,(A) sometimes is used instead of P[X € A] in equation (2.1.2). 
This defines a set function on the collection of real-valued events, and it can be 
shown to satisfy the three basic conditions of a probability set function, as given 
by Definition 1.3.1. 

Although the random variable X is defined as a function of e, it usually is 
possible to express the events of interest only in terms of the real values that X 
assumes. Thus, our notation usually will suppress the dependence on the out- 
comes in S, such as we have done.in equation (2.1.2). 

For instance, in Example 2.1.1, if we were interested in the event of obtaining a 
score of “at most 3,” this would correspond to X = 1, 2, or 3, or X € {1, 2, 3}. 
Another possibility would be to represent the event in terms of some interval! that 
contains the values 1, 2, and 3 but not 4, such as A =(— 00, 3]. The associated 
equivalent event in S is B=B,UB,U8,, and the probability is 
PLX € A] = P(B) = 1/16 + 3/16 + 5/16 = 9/16. A convenient notation for 
P[X ¢ A], in this example, is PLX < 3]. Actually, any other real event containing 
1, 2, and 3 but not 4 could be used in this way, but intervals, and especially those 
of the form (— 00, x], will be of special importance in developing the properties of 
random variables. 

As mentioned in Section 1.3, if the probabilities can be determined for each 
‘elementary event in a discrete sample space, then the probability of any event can 
be calculated from these by expressing the event as a union of mutually exclusive 
elementary events, and summing over their probabilities. 

A more general approach for assigning probabilities to events in a real sample 
space can be based on assigning probabilities to intervals of the form (— 00, x] 
for all real numbers x. Thus, we will consider as random variables only functions 
X that satisfy the requirements that, for all real x, sets of the form 


B=[X <x]={eleeS and X(e)e(—«, x]} (2.1.3) 


are events in the sample space S. The probabilities of other real events can be 
evaluated in terms of the probabilities assigned to such intervals. For example, 
for the game of Example 2.1.1, we have determined that P[X < 3] = 9/16, and it 
also follows, by a similar argument, that PLX < 2] = 1/4. Because (— 0, 2] con- 
tains. 1 and 2. but not 3, and. (—0o, 3] =(—o, 2] U (2, 3], it follows that 
PLX = 3] = PLX < 3] — P[X < 2] = 9/16 — 1/4 = 5/16. 

Other examples of random variables can be based on the sampling problems of 
Section 1.6. 
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Example 2.1.2 


2:2 
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In Example 1.6.15, we discussed several alternative approaches for computing the 
probability of obtaining “exactly two black” marbles, when selecting five (without 
replacement) from a collection of 10 black and 20 white marbles. Suppose we are 
concerned with the general problem of obtaining x black marbles, for arbitrary x. 
Our approach will be to define a random variable X as the number of black 
marbles in the sample, and to determine the probability P[X = x] for every pos- 
sible value x. This is easily accomplished with the approach given by equation 
(1.6.8), and the result is 


Ces 
PIXE 5] ASA OM tog 9G as (2.1.4) 


(3) 


Random variables that arise from counting operations, such as the random var- 
iables in Examples 2. 1.1. and 2.1.2, are integer-valued. Integer-valued ran- 
dom variables are examples of an important special type known as discrete random 
variables. 


DISCRETE RANDOM VARIABLES 


Definition 2.2.7 


If the set of all. possible values of a random variable, X, isa countable set, 
X41) Xq9 0-0) Nyy OL Xy, Xp, ---, then X is called a discrete random variable. The func- 
tion 


f(x) =P[X =x] x =x4,%,... (2.2.1) 


that assigns the probability to each possible value x will be called the discrete prob- 
ability density function (discrete pdf). 


If it is clear from the context that X is discrete, then we simply will say pdf. 
Another common terminology is probability mass function (pmf), and the possible 
values, x;, are called mass points of X. Sometimes a subscripted notation, f(x), is 
used. 

The following theorem gives ‘general properties that any discrete pdf must 
satisfy. 


Theorem 2.2.7 


TABLE 2.7 
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A function f(x) is a discrete pdf if and only if it satisfies both of the following 
properties for at most a countably infinite set of reals x,, x2,...: 


f(x) 20 (2.2.2) 


for all x;, and 


» fj) =1 (2.2.3) 
all x; 
Proof 
Property (2.2.2) follows from the fact that the value of a discrete pdf is a probabil- 
ity and must be nonnegative. Because x,, x,, ... represent all possible values of 


X, the events [X = x,], [X =x], ... constitute an exhaustive partition of the 
sample space. Thus, 


y fo) = VPIX =x) =1 


all xj all x; 


Consequently, any pdf must satisfy properties (2.2:2) and (2.2.3) and any func- 
tion that satisfies properties (2.2.2) and (2.2.3) will assign probabilities consis- 


tent with Definition 1.3.1. 


In some problems, it is possible to express:the pdf by means of an equation, 
such as equation (2.1.4). However, it is sometimes more convenient to express it 
in tabular form. For. example, one way to specify the pdf of X for the random 


variable X im Example 2.1.1 is given in Table 2.1. 


Values of the discrete pdf 
of the maximum of two rolls 
of a four-sided die 


x 1 2 3 4 


f(x) 116 3/16 5/16 7/16 


Of course, these are the probabilities, respectively, of the events B,, B,, B;, 
and B, in S. 

A graphic representation of f(x) is also of some interest. It would be possible to 
leave f(x) undefined at points that are not possible values of X, but it is conve- 
nient to define f(x) as zero at such points. The graph of the pdf in Table 2.1 is 
shown in Figure 2.2. 
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FIGURE 2.2 


Example 2.2.7 
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Discrete pdf of the maximum of two rolls of a four-sided die 


F(X) 


7/16 
5/16 ~ 
3/16 + 
1/16 


Example 2.1.1 involves two rolls of a four-sided die. Now we will roll a 12-sided 
(dodecahedral) die twice. If each face is marked with an integer, 1 through 12, 
then each value is equally likely to occur.on a single roll of the die. As before, we 
define a random variable X to be the maximum obtained on the two rolls. It is 
not hard to see that for each value x there are an odd number, 2x — 1, of ways 
for that value to occur. Thus, the pdf of X must have the form 


f()=c(2x—1) for x =1,2,..., 12. (2.2.4) 


One way to determine c would be to do a more complete analysis of the counting 
problem, but another way would be to use equation (2.2:3). In particular, 


12 12 12 
1 ¥ fo)=6¥ Ox-y=q25x~12| 
x=1 x= x= 


_ ers 
=C <a 


Soc = 1/(12)? = 1/144. 


5 12| = c(12)? 


As mentioned in the last section, another way to specify the distribution of 
probability is to assign probabilities to intervals of the form (— 0, x], for all real 
x. The probability assigned to such an event is given by a function called the 
cumulative distribution function. 


Definition 2.2.2 


The cumulative distribution function (CDF) of a random variable X is defined for 


any real x by 
F(x) = P[X <x] (2.2.5) 


FIGURE 2.3 


Theorem 2.2.2 
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The CDF of the maximum of two rolls of a four-sided die 


F(x) 


The function F(x) often is referred to simply as the distribution function of X, 
and the subscripted notation, F(x), sometimes is used. 

For brevity, we often will use a short notation to indicate that a distribution of 
a particular form is appropriate. If we write X ~'f(x) or X ~ F(x), this will 
mean that the random variable X has pdf f(x) and CDF F(x). 

As seen in Figure 2.3, the CDF of the distribution given in Table 2.1 is a 
nondecreasing step function. The step-function form of F(x) is common to all 
discrete distributions, and the sizes of the steps or jumps in the graph of F(x) 
correspond to the values of f(x) at those points. This is easily seen by comparing 
Figures 2.2 and 2.3. 

The general relationship between F(x) and f(x) for a discrete distribution is 


~ given by the following theorem. 


Let X be a discrete random variable with pdf f(x) and CDF F(x). If the possible 
values of X are indexed in increasing order, x, <x,<x3,<°::, then f(x,) 
= F(x,), and for anyi > 1, 


Sf (x) = F(x) — FO;-1) (2.2.6) 
Furthermore, if x < x, then F(x) = 0, and for any other real x 
F(x) = ¥ fx) (2.2.7) 
xiSx 
where the summation is taken over all indices i such that x; < x. a 


The CDF of any random variable must satisfy the properties of the following 
theorem. 
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Theorem 2.2.3 A function F(x).is a CDF for some random variable X if and only if it satisfies 
the following properties: 


lim F(x) = 0 (2.2.8) 
lim F(x) = 1 (2.2.9) 
nin Fs +h) = F(x) (2.2.10) 
a < b implies F(a) < F(b) (2.2.11) 


The first two properties say that F(x) can be made arbitrarily close to 0 or 1 by 
taking x arbitrarily large, and negative or positive, respectively. In the examples 
considered so far, it turns out that F(x) actually assumes these limiting values. 
Property (2.2.10) says that F(x) is continuous from the right. Notice that in Figure 
2.3 the only discontinuities are at the values 1, 2, 3, and 4, and the limit as x 
approaches these values from the right is the value of F(x) at these values. On the 
other hand, as x approaches these values from the left, the limit of F(x) is 
the value of. F(x).on.the lower. step, so F(x) is not (in general) continuous from 
the left. Property (2.2.11) says that F(x) is nondecreasing, which is easily seen to be 
the case in Figure 2.3. In general, this. property follows from the fact that an 
interval of the form (— oo, b] can be represented as the union of two disjoint 
intervals 


(—0, b] = (—o, a] U (a, b] (2.2.12) 


for any a<b. It follows that F(b) = F(a)+ P[a<x<b]> F(a), because 
Pla <x <b] > 0, and thus equation (2.2.11) is obtained. 

Actually, by this argument we have obtained another. very useful result, 
namely. 


Pla < X <b] = F(b) — Fla) (2.2.13) 


This reduces the problem of computing probabilities for events defined in terms 
of intervals of the form (a, b] to taking differences with F(x). 

Generally, it is somewhat easier to understand the nature of a random variable 
and its probability distribution by considering the pdf directly, rather than the 
CDF, although the CDF will provide a good basis for defining continuous prob- 
ability distributions. This will be considered in the next section. 

Some important properties. of probability distributions involve numerical 
quantities called expected values. 
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Definition 2.2.3 


If X is a discrete random va.iable with pdf f(x), then the expected value of X is 
defined by 


E(X) =) xf(x) (2.2.14) 


The sum (2.2.14) is understood to be over all possible values of X. Further- 
more, it is.an ordinary sum if the range of X is finite, and an infinite series if the 
range of X is infinite. In the latter case, if the infinite series is not absolutely 
convergent, then we will say that E(X) does not exist. Other common notations 
for E(X) include y, possibly with a subscript, wy. The terms mean and expectation 
also are often used. 

The mean or expected value of a random variable is a “weighted average,” and 
it can be considered as a measure of the “center” of the associated probability 
distribution. 


Example 2.2.2 A box contains four chips. Two are labeled with the number 2, one is labeled 


with a 4, and the other with an 8. The average of the numbers on the four chips is 
(2+ 2+ 4+ 8)/4 = 4. The experiment of choosing a chip at random and record- 
ing its number can be associated with a discrete random variable X having dis- 
tinct values x=2, 4, or 8, with f(2)=1/2 and f(4)= f(8)=1/4. The 
corresponding expected value or mean is 


conse) 


' as before. Notice that this also could model selection from a larger collection, as 


long as the possible observed values of X and the respective proportions in the 


collection, f(x), remain the same as in the present example. 


FIGURE 2.4 


There is an analogy between the distribution of probability to values, x, and 
the distribution of mass to points ina physical system. For example, if masses of 
0.5, 0.25, and 0.25 grams are placed at the respective points x = 2, 4, and 8 cm on 
the horizontal axis, then the value 2(0.5) + 4(0.25) + 8(0.25) = 4 is the “center of 
mass” or balance point of the corresponding system. This is illustrated in 
Figure 2.4. 


The center-of-mass interpretation of the mean 


2 4 8 


ee 
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Example 2.2.3 


PB 
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In the previous example E(X) coincides with one of the possible values of X, 
but this is not always the case, as illustrated by the following example. 


A game of chance is based on drawing two chips at random without replacement 
from the box considered in Example 2.2.2. If the numbers on the two chips 
match, then the player wins $2; otherwise, she loses $1. Let X be the amount won 
by the player on a single play of the game. There are only two possible values, 
X = 2 if both chips bear the number 2, and X = —1 otherwise. Furthermore, 


4 
there are ( ) = 6 ways to. draw two. chips, and only one of these outcomes corre- 


spond to a match. The distribution of X is f(2).= 1/6 and f(—1) = 5/6, and con- 
sequently the expected amount won is E(X) = (—1)(5/6) + (2)(1/6) = —1/2. Thus, 
the expected amount “won” by the player is actually an expected loss of one-half 
dollar. 

The connection with long-term relative frequency also is well illustrated by this 
example. Suppose the game is played M times in succession, and denote the 
relative frequencies of winning and losing by fy and f;, respectively. The average 
amount the player wins is (—1)f, +(2)fy. Because of statistical regularity, we 
have that f, and f, approach f(—1) and f(2), respectively, and thus the player’s 
average winnings approach E(X) as M approaches infinity. 

Notice also that the game will be more equitable if the payoff to the player is 
changed to $5 rather than $2, because the resulting expected amount won then 
will be (—1)(5/6) + (5)(1/6) = 0. In general, for a game of chance, if the net 
amount won by a player is X, then the game is said to be a fair game if E(X) = 0. 
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——— Se 
Example 2.3.7 


The notion of a discrete random variable provides an adequate means of prob- 
ability modeling for a large class of problems, including those that arise from the 
operation of counting. However, a discrete random variable is not an adequate 
model in many. situations, and we must consider the notion of a continuous 
random variable. The CDF defined earlier remains meaningful for continuous 
random variables, but it also is useful to extend the concept of a pdf to contin- 
uous random variables. 


Each work day a man rides a bus to his place of business. Although a new bus 
arrives promptly every five minutes, the man generally arrives at the bus stop at a 
random time between bus arrivals. Thus, we might take his waiting time on any 
given morning to be a random variable X. 

Although in practice we usually measure time only to the nearest unit (seconds, 
minutes, etc.), in theory we could measure time to within some arbitrarily small 
unit. Thus, even though in practice it might be possible to regard X as a discrete 


FIGURE 2.5 
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random variable with possible values determined by the smallest appropriate 
time unit, it usually is more convenient to consider the idealized situation in 
which X is assumed capable of attaining any value in some interval, and not just 
discrete points. 

Returning to the man waiting for his bus, suppose that he is very observant 
and noticed over the years that the frequency of days when he waits no more 
than x minutes for the bus is proportional to x for all x. This suggests a CDF of 
the form F(x) = P[X < x] =cx, for some constant c> 0. Because the buses 
arrive at regular five-minute intervals, the range of possible values of X is the 
time interval [0, 5]. In other words, P[O < X <5] =1, and it follows that 
1 = F(5)=c - 5, and thus c= 1/5, and F(x) = x/5 if 0< x <5. It also follows 
that F(x) = 0ifx <0. and F(x)=1ifx>5. 

Another way to study this distribution would be to observe the relative fre- 
quency of bus arrivals during short time intervals of the same length, but distrib- 
uted throughout the waiting-time interval [0, 5]. It may be that the frequency of 
bus arrivals during intervals of the form (x, x + Ax] for small Ax was proportion- 
al to the length of the interval, Ax, regardless of the value of x. The correspond- 
ing condition this imposes on the distribution of X is 

Pix < X <x + Ax] = F(x + Ax) — F(x) = Ax 
for all0 <x <x+Ax.<5.and some c.>.0. Of course, this implies that if F(x) is 
differentiable .at x, its derivative is constant, F(x) =c >.0..Note also that for 
x <0 or.x >.5,. the. derivative .also.. exists, but. F(x)=0 because 
P[x < X <x + Ax] =0 when x and x + Ax are not possible values of X, and 
the derivative does not exist at all at x =.0or-5. 


In-general, if F(x) is the: CDF of a continuous random variable X, then we will 


denote its derivative (where it exists) by f(x), and under certain conditions, which 


will be specified shortly, we will call.f(x) the probability density function of X. In 
our example, F(x) can be represented for. values of x.in the interval [0, 5] as the 
integral of its derivative: 


x cae | x 
F(x). = [/ 10 dt -| 54 =%5 


0 


The graphs of F(x) and f(x) are shown in Figure 2.5. 


CDF and pdf of waiting time for a bus 
F(x) ~ £09) 
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This provides.a general approach to defining the distribution of a continuous 
random variable X. 


Definition 2.3.9 


A random variable X is called. a continuous random variable if there is a function 
f(x), called. the probability density function (pdf) of X, such that the CDF can be 
represented as 


F(x) = | " ¢oae (2.3.1) 


In.more.advanced treatments of probability, such distributions sometimes are 
called “absolutely continuous’ distributions. The reason for.such a distinction is 
that.CDFs exist that. are continuous (in the usual sense), but which cannot be 
represented as the integral of the derivative. We will apply the terminology con- 
tinuous distribution only to.probability distributions that satisfy property (2.3.1). 

Sometimes it is convenient to use a subscripted notation, Fy(x) and f,(x), for 
the CDF and pdf, respectively. 

The defining property (2.3.1) provides a way to derive the CDF when the pdf is 
given, and it follows by the Fundamental Theorem of Calculus that the pdf can 
be obtained from the CDF by differentiation. Specifically, 


f[o= < F(x) = F'(x) (2.3.2) 


wherever the derivative exists. Recall from Example 2.3.1 that there were two 
values of x where the derivative. of F(x) did not’exist: In general, there may be 
many values: of x where F(x) is not differentiable, and these will occur at discon- 
tinuity points of the pdf, f(x). Inspection of the graphs of f(x) and F(x) in Figure 
2.5 shows that this situation occurs in the example at x = 0.and x = 5. However, 
this will not usually create a problem if the set of such values is finite, because an 
integrand can be redefined arbitrarily at a finite number of values x without 
affecting the value of the integral. Thus, the function F(x), as represented in pro- 
perty (2.3.1), is unaffected regardless of how we treat such values. It also follows 
by similar considerations that events such as [X = c], where c is a constant, will 
have probability zero when X is a continuous random variable. Consequently, 
events of the form [X é€ I], where J is an interval, are assigned the same probabil- 
ity whether J includes the endpoints or not. In other words, for a continuous 
random variable X, if a < b, 


Pla<X <b]=Pla<X <b]=Pla<x <b] 
= Pla< X <b] (2.3.3) 
and each of these has the value F(b) — F(a). 


Theorem 2.3.7 


Example 2.3.2 
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Thus, the CDF, F(x), assigns probabilities to events of the form (— co, x], and 
equation (2.3.3) shows how the probability assignment can be extended to any 
interval. 

Any function f(x) may be considered as a possible candidate for a pdf if it 
produces a legitimate CDF when integrated as in property (2.3.1). The following 
theorem provides conditions that will guarantee this. 


A function f(x) is a pdf for some continuous random variable X if and only if it 
satisfies the properties 


I(x) 20 (2.3.4) 


for all real x, and 


il I(x) dx =1 (2.3.5) 


Proof 


Properties (2.2.9) and (2.2.11) of a-CDF follow from properties (2.3.5) and (2.3.4), 
respectively. The other. properties follow from general results about integrals. 


A machine produces copper wire, and occasionally there is a flaw at some point 
along the wire. The length of wire (in meters) produced between successive fiaws 
is a continuous random variable X with pdf of the form 
— Jel +x)73 x>0 
[a= 15 20 (2.3.6) 


where c is a constant. The value of c can be determined by means of property 
(2.3.5). Specifically, set 


Lis [: f(x) dx = [ra +x) 3dx= (3) 
0 0 2: 


which is obtained following the substitution u = 1 + x and an application of the 
power rule for integrals. This implies that the constant is c = 2. 
Clearly property (2.3.4) also is satisfied in this case. 
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The CDF for this random variable is given by 


F(x) = P[X <x]= I: S(t) dt 


0 x 
{ ods [214979 a x>0 


(9) 


| 0 dt x <0 
Se (1px) 74 x>0 
~ (0 x<0 


Probabilities of intervals, such as Pla < X <b], can be expressed directly in 
terms of the CDF or as integrals of the pdf. For example, the probability that a 
flaw occurs between 0.40 and 0.45 meters is given by 

0.45 


P[040<X <045]=| f(x) dx = F(0.45) — F(0.40) = 0.035 
0.40 


Consideration of the frequency of occurrences. over short intervals was sug- 
gested as a possible way to study a continuous distribution in Example 2.3.1. 
This approach provides some insight into the general nature of continuous dis- 
tributions. For example, it may be observed that the frequency of occurrences 
over short intervals of length Ax, say [x, x + Ax], is at least approximately pro- 
portional to the length of ‘the ‘interval, Ax, where the proportionality factor 
depends on x, say f(x). The condition this imposes on the distribution of X is. 


Pix <X <x + Ax] = Fx + Ax) — F(x) 
= f(x) Ax (2:3.7) 


where the error in the approximation is negligible relative to the length of the 
interval, Ax. This is illustrated in Figure 2.6. for the copper wire example. 

The exact probability in equation (2.3.7) is represented by the area of the 
shaded region under the graph of f(x), while the approximation is the area of the 
corresponding rectangle with height f(x) and width Ax. 

The smaller the value of Ax, the closer this approximation becomes. In this 
sense, it might be reasonable to think of f(x) as assigning “probability density” 
for the distribution of X, and the term probability density function seems appro- 
priate for f(x). In other words, for a continuous random variable X, f(x) is not a 
probability, although it.does determine the probability assigned to arbitrarily 
small intervals. The area between the x-axis and the graph of f(x) assigns prob- 
ability to intervals, so that for a < b, 


5 
Pla<X <b]= ( I(x) dx (2.3.8) 


FIGURE 2.6 
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Continuous assignment of probability by pdf 


QO x .x+Ax 


In Example 2.3.2, we could take the probability that the length between suc- 
cessive flaws between 0.40 and 0.45 meters to be approximately 
f(0.40)(0.05) = 2(1.4)~ 3(0.05) = 0.036, or we could integrate the pdf between the 
limits 0.40 and 0.45 to obtain the exact answer, 0.035. For longer intervals, inte- 
grating f(x) as.in equation (2.3.8) would be more reasonable. 

Note that in Section 2.2 we referred to a probability density function or density 
function for a discrete random variable, but the interpretation there is different, 
because probability is assigned at. discrete.points-in that.case rather than in a 
continuous manner. However, it will be convenient to refer to the “density func- 
tion” or pdf in both continuous and discrete cases, and to use the same notation, 


f(x). or f(x), in the later chapters of the book. This will avoid the necessity of 


separate statements of general results that apply.to. both cases. 
The notion of expected value can be extended to continuous random variables. 


Definition 2.3.2 


If X is a continuous random variable with pdf f(x), then the expected value of X is 
defined by 


@ 
E(X) -[ xf (x) dx (2.3.9) 
if the integral in equation (2.3.9) is absolutely convergent. Otherwise we say that 
E(X) does not exist. 


A 


As in the discrete case, other notations for E(X) are uw or uy, and the terms 
mean or expectation of X also are commonly used. The center-of-mass analogy is 
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FIGURE 2.7 
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still valid in this case, where mass is assigned to the x-axis in a continuous 
manner and in accordance with f(x). Thus, 4 can also be regarded as a central 
measure for a continuous distribution. 

In Example 2.3.2, the mean length between flaws in a piece of wire is 


ta) ro) 
z= | x 0dx+ | x + 21 + x)73 dx 


i) 


— a 


If we make the substitution ¢ = 1 + x, then 


ne 1 
y=2[ (net a= r(1-5)=1 
1 2 


Other properties of probability distributions can be described in terms of 
quantities called percentiles. 


Definition 2.3.3 


if0<p <1, then a 100 x pth percentile of the distribution of a continuous random 
variable X is a solution x, to the equation 


F(x,) =P (2.3.10) 


In general; a distribution may not be continuous, and if it has a discontinuity, 
then there will be some values of p for which equation (2.3.10) has no solution. 
Although we emphasize the continuous case in this book, it is possible to state a 
general definition of percentile by defining a pth percentile of the distribution of 
X to be a value x, such that P(X <x,]>p and P[X >x,|>1i—p. 

In essence, x, is a value such that 100 x p percent of the population values are 
at most x, and 100 x (1 — p) percent of the population values are at least x,. 
This is illustrated for a continuous distribution in Figure 2.7. We also can think 
in terms of a proportion p rather than a percentage 100 x p of the population, 
and in this context x, is called a pth quantile of the distribution. 


A 100 x pth percentile 


LO) 


Exemple 2.3.3 
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A median of the distribution of X is a 50th percentile, denoted by x95 or m. 
This is an important special case of the percentile such that half of the population 
values are above it and half are below it. The median is used in some applications 
instead of the mean as a central measure. 


Consider the distribution of lifetimes, X (in months), of a particular type of com- 
ponent. We will assume that the CDF has the form 


F(x) =1—e7FPP x >'0 
and zero otherwise. The median lifetime is 
m = 3[—In (1 — 0.5)]*/? = 3 ./ln 2 = 2.498 months 


It is desired to find the time t such that 10% of the components fail before t. 
This is the 10th percentile: 


Xo.190 = 3[—In (1 — 0.1)]*? = 3./—In (0.9) = 0.974 months 


Thus, if the:-components are guaranteed for one month, slightly more than 10% 
will need to be replaced. 


Another measure of central tendency, which is sometimes considered, is the 
mode. 


Definition 2.3.4 


If the pdf has a unique maximum at x = my, say max f(x) =f(mpo), then mg is called 
the mode of X. 


In the previous example, the pdf of the distribution of lifetimes is 


f(x= (Z)xeno x>0 


The solution to f’(x)=0 is the unique maximum of f(x), x =m) = 3./2/2 
= 2.121 months. 

In general, the mean, median, and mode may be all different, but there are 
cases in which they all agree. 


Definition 2.3.5 
A distribution with pdf f(x) is said to be symmetric about c if f(c — x) =f(¢ + x) for 


all x. 
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The pdf of a symmetric distribution 


F(x) 


In. other words, the “centered” pdf g(x) = f(c — x) is an even function, in the 
usual sense that g(x) = g(—x). The graph of y = f(x) is a “mirror image” about 
the vertical line x = c: Asymmetric distributions, such as the one in Example 
2.3.2, are called skewed distributions. 

If f(x) is symmetric about c and the mean yp exists, then c = y. If additionally, 
(x).has a unique maximum at m, and a unique median m, then p = my = m. This 
is illustrated in Figure 2.8. 


MIXED DISTRIBUTIONS 


It is possible to have a random variable whose distribution is neither purely 
discrete nor continuous. A probability distribution for a random variable X is of 
mixed type if the CDF has the form 


F(x) = aF (x) + (1 — a)F (x) 


where F(x) and F(x) are CDFs of discrete and continuous type, respectively, and 
0<a<i. 


Suppose that a driver encounters a stop sign and either waits for a random 
period of time before proceeding or proceeds immediately. An appropriate model 
would allow the waiting. time to be either zero or positive, both with nonzero 
probability. Let the CDF of the waiting time X be 


F(x) = 0.4F,(x) + 0.6F,(x) 
= 0.4 + 0.6(1 — e~*) 


where F(x) = 1.and F(x) =1—e7* if x 20, and both are zero if x < 0. The 
graph of F(x) is shown in Figure 2.9. Thus, the probability of proceeding imme- 
diately is P[X = 0] = 0.4. The probability that the waiting time is less than 0.5 


2.4 SOME.PROPERTIES OF EXPECTED VALUES 


FIGURE 2.9 The CDF of a mixed distribution 


F(X) 


minutes is 
P[X < 0.5] = 0.4 + 0.6(1 — e~°5) = 0.636 

The distribution of X given 0 < X corresponds to 

P[O< X and X <x] 
P[O < x] 

—PIO<X <x] 
~ Pf0 < X] 
_ F(x) — FO) 


1 — F(0) 


P[IX <x|0<X]= 


_ 04+ 0.61 —e7*)—04 


1-04 


=l-—e* 


2.4 


SOME PROPERTIES OF EXPECTED VALUES 
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It is useful to consider, more generally; the expected value of a function of X. For 


example, if the radius of a disc is a random variable X, then the area of the disc, 


say Y = xX”, is a function of X. In general, let X be a random variable with pdf 


f(x), and denote by u(x) a real-valued function whose domain includes the pos- 


sible values of X. If we let Y = u(X), then Y is a random variable with its own 
pdf, say g(y). Suppose, for example, that X is a discrete random variable with pdf 


f(x). Then ¥ = u(X) is also a discrete random variable with pdf g(y) and expected 
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value defined in accordance with Definition 2.2.3, namely E(Y) => yg(y). Of 


¥ 
course, evaluation of E(Y) directly from the definition requires knowing the pdf 
g(y). The following theorem provides another way to evaluate this expected value. 
The proof, which requires advanced methods, will be discussed in Chapter 6. 


If X is a random variable with pdf f(x) and u(x) is a real-valued function whose 
domain includes the possible values of X, then 


E[u(X)] = ¥ u(x) f(x) if X is discrete (2.4.1) 


E[u(X)] = | u(x) f(x) dx if X is continuous (2.4.2) 


It is clear that the expected value will have the “linearity” properties associated 
with integrals and sums. 


If X is a random variable with pdf f(x), a and b are constants, and g(x) and h(x) 
are real-valued functions whose domains include the possible values of X, then 


E[ag{X) + bh(X)] = aE[g(X)] + SETAC] (2.4.3) 


Proof 


Let X be continuous. It follows that 


ELag(X) + bh(X)] = ic Lag(x) + bh(x)] f(x) dx 


=a [° g(x) f(x) dx + b i h(x) f (x) dx 
= aE[9(X)] + bE[LH(X)] 


The discrete case is similar. : eo] 


An obvious result of this theorem is that 
E(aX +.b) = aE(X) +6 (2.4.4) 
An important special expected value is obtained if we consider the function 
u(x) = (x — p)? 
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Definition 2.4.7 


The variance of a random variable X is given by 


Var(X) = E[(X — w)7] 


Other common notations for the variance are 0”, 6%, or V(X), and a related 
quantity, called the standard deviation of X, is the positive square root of the 
variance, o = oy = ./Var(X). 

The variance provides a measure of the variability or amount of “spread” in 
the distribution of a random variable. 


Example 2.4.1 In the experiment of Example 2.2.2, E(X?) = 27(1/2) + 47(1/4) + 87(1/4) = 22, and 
thus Var(X) = 22 —4* =6 and oy = J6 = 2.45. For comparison, consider a 
slightly different experiment where two chips are labeled with zeros, one with a 4, 
and one with a 12. If one chip is selected at random, and Y is its number, 
then E(Y)=4 as in the original example. However, Var(Y) = 24 and oy 
= 2,/6 > ox, which reflects the fact that the probability distribution of Y has 
[eves sss more spread than that.of X. 


Certain special expected values, called moments, are useful in characterizing 
some features of the distribution. ; 


Definition 2.4.2 

The &th moment about the origin of a random variable X is 
My = EX") 

and the Ath moment about the mean is 


My = ELX — E(X)\* = E(X — p) 


Thus E(X*) may be considered as the kth moment of X or as the first moment 
of X*. The first moment is the mean, and the simpler notation y, rather than y, 
generally is preferred. The first moment about the mean is zero, 


Hy, = E[X — E(X)] = E(X) — E(x) = 0 
The second moment about the mean is the variance, 


Hy = E[(X — p)*] = 0? 
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and the second moment about the origin is involved in the following theorem 
about the variance. 


If X is a random variable, then 
Var(X) = E(X?) — p? (2.4.8) 


Proof 


Var(X) = E(X? — 2uX + py’) 
= E(X*) — 2uE(X) + 2? 
= E(X*) — 2p? + yp? 
which yields the theorem. 
It also follows immediately that 
E(X*) = 0? + (2.4.9) 


As noted previously, the variance provides 'a measure of the amount of spread 
in a distribution or the variability among members of a population. A rather 
extreme example of this. occurs when X assumes only one value, say 
PLEX = cj = 1-In this case E(X) = cand Var(X).= 9. 

The remark following Theorem 2.4.2 dealt with the expected value of a linear 
function of a random variable. The following theorem deals with the variance. 


If X is a random variable and a.and b are constants, then 


Var(aX + b) = a? Var(X) (2.4.10) 


Proof 
Var(aX + b) = E[(aX + b — apy — b)*] 
= Efa(X — px)"] 
=a? Var(X) * 


This means that the variance is affected by a change of scale, but not by a trans- 
lation. 

Another natural measure of variability would be the mean absolute deviation, 
E|X — y|, but the variance is generally a more convenient quantity with which 
to work. 


sui ees 


Theorem 2.4.5 


Theorem 2.4.6 
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The mean and variance provide a good deal of information about a population 
distribution, but higher moments and other quantities also may be useful. For 
example, the third moment about the mean, 3, is a measure of asymmetry or 
“skewness” of a distribution. 


If the distribution of X is symmetric about the mean yu = E(X), then the third 
moment about yp is zero, 43 = 0. 


Proof 


See Exercise 28. 


We can conclude that if u, # 0, then the distribution is not symmetric, but not 
conversely, because distributions exist that are not symmetric but which do have 
Hz = 0 (see Exercise 29). 


BOUNDS ON PROBABILITY 


It is possible, in some cases, to find bounds on probabilities based on moments. 


If X is a random variable and u(x) isa nonnegative real-valued function, then for 
any positive constant c > 0, 


Plu(X) 2c] < AL) (2.4.11) 


Proof 


If A = {x|u(x) > c}, then for a continuous random variable, 


E[u(X)] = { ” Woof le) dx 


—oO 


= [ wo f(x) dx + | u(x) f(x) dx 
A, Ac 

2 [w0 f(x) dx 
A 


2 | cf (x) dx 

A 
= cP[X € A] 
= cP[u(X) 2c] 
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A similar proof holds for discrete variables. 


A special case, known as the Markov inequality, is obtained if u(x) = |x|" for 
r > 0, namely 


E(x’) 


ms (2.4.12) 


PUX|>c]< 


Another well-known result, the Chebychev inequality, is given by the following 
theorem. 


If X is a random variable with mean y and variance o?, then for any k > 0, 


st 


PULX — pl ekol< @ (2.4.13) 
Proof 
If u(X) = (X — y)*, c = k?o?, then using equation (2.4.11), 
E(X —py — 1 
PU(X = p)? > k?07] Qt Sha 
and the result follows. 
An alternative form is 
PUX ul <kol 1-75 (2.4.14) 
and if we let e = ke, then 
2 
PIX —z|<e]b1-] (2.4.15) 
and 
x 
PUX —p| ze] <z . (2.4.16) 


é 


Letting k = 2, we see that a random variable will be within two standard devi- 
ations of its mean with probability at least 0.75. Although this may not be a tight 
bound ‘in all cases, it is surprising that such a bound can be found to hold for all 
possible discrete and continuous distributions. A tighter bound, in general, 
cannot be obtained, as shown in the following example. 


——— 


Exampie 2.4.2 
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Suppose that X takes on the values —1, 0, and 1 with probabilities 1/8, 6/8, and 
1/8, respectively. Then wp = 0 and o? = 1/4. Fork = 2, 


P[—2(0.5) << X —0< 20.5] =P[-1<X <1] 


= P[X =0] 
3 1 


It also is possible to show that if the variance is zero, the distribution is con- 
centrated at a single value. Such a distribution is called a degenerate distribution. 


Let w = E(X) and o? = Var(X). If o? = 0, then P[X = yw] = 1. 


Proof 


If x # uw for some observed value x, then |x — | > 1/i for some integer i > 1, and 
conversely. Thus, 


2 1 

[XAnu]= U [xa >+| 
i=1 

and using Boole’s inequality, equation (1.4.5), we have 


P[X £u] < Selix-a>2| 


=1 1 


- and using equation (2.4.16) we obtain 


foo) 


P[LX #u] < ¥ i707 =0 
i=4 


which implies that PLX = y] = 1. 


APPROXIMATE MEAN AND VARIANCE 


if a function of a random variable, say H(X), can be expanded in a Taylor series, 
then an expression for the approximate mean and variance of H(X) can be 
obtained in terms of the mean and variance of X. 

For example, suppose that H(x) has derivatives H’(x), H”(x), ... in an open 
interval containing w = E(X). The function H(x) has a Taylor approximation 
about yp, 


A(x) = H(u) + Hux — w) + $A"w(x — w)? (2.4.17) 
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which suggests the approximation 


E(A(X)] = H(w) + 4H"(Wo? (2.4.18) 
and, using the first two terms, 
Var[H(X)] = [H’()]?0? (2.4.19) 


where o? = Var(X). 
The accuracy of these approximations depends primarily on the nature of the 
function H(x) as well as on the amount of variability in the distribution of X. 


Example 2.4.3 Let X be a positive-valued random variable, and let H(x)=In x, so that 
H’(x) = 1/x and H’(x) = —1/x?. It follows that 


E{in X] = Int (5\- as}et 


I a 

= in _-— 
B ye 

and 

2 


1 Zz, 
Var[in X] = (;) e= c 
a Hu 


22 


MOMENT GENERATING FUNCTIONS 


A special expected value that is quite useful is known as the moment generating 
function. 


Definition 2.8.7 


If X is a random variable, then the expected value 


M,({t) = Efe”) ‘ (2.5.1) 


is called the moment generating function (MGF) of X if this expected value exists for 
all values of t in some interval of the form —h <t <h for someh > 0. 


In some situations it is desirable to suppress the subscript and use the simpler 
notation M(t). 


Exampie 2.8.7 
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Assume that X is a discrete finite-valued random variable with possible values x,, 
.. +3 Xp» The MGF is 


Ma) = Selo 
which is.a differentiable function of t; with derivative 
Malo) = ¥xiefals) 
and, in general, for any positive integer r, 
MYLO) = Y xie™fyCxd 
i=1 


Notice that if we evaluate M{?(t) at t = 0 we obtain 


MYO) = ¥ xifelxd = EOC) 
the rth moment about the origin. This also suggests the possibility of expand- 
ing in a power series about t=0, M,({t)=co+c,t+c,t?+--+:, where c, 
= E(x yr! 
These properties hold for any random. variable for which an MGF exists, 
although a general proof is somewhat harder. 


If the MGF of X exists, then 


E(X") = M(0) for all r = 1, 2,... (2.5.2) 
and 

M,(t)=1+ x Me, Je (2.5.3) 
Proof 


We will consider the case of a continuous random variable X. The MGF for a 
continuous random variable is 


M,(t) = ib efx) dx 


When the MGF exists, it can be shown that the rth derivative exists, and it can 
be obtained by differentiating under the integral sign, 


MY(t) = ir xe f(x) dx 


—-oO 
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from which it follows that for allr = 1, 2,... 


E(X’) = [’ X'fy(x) dx = [; xre*f,(x) dx = MY(0) 


— ao 


When the MGF exists, it also can be shown that a power series expansion about 
zero is possible, and from standard results about power series, the coefficients 
have the form M¥(0)/r!. We combine this with the above result to obtain 


Mt) aes ps MEOW +5 2 HO 


The discrete case is similar: 


Example 2.5.2 Consider a continuous random variable X with pdf f(x) =e * if x > 0, and zero 
otherwise. The MGF is 


M,(t) = [ e*e"* dx 


ro) 
= [ e 7 Usnx dx 
{3} 


ee i eT Gnnx re 
i—t : 
1 
= ae t<1l 
The rth derivative is MP() =r'(1— 7" 1, and thus the rth moment is E(X") 
= MY(0)=r!. The mean is p=E(X)=1!=1, and the variance is 
Var(X) = E(X*) — yw? = 2-1=1. 


| exampte 2.5.3 A discrete random variable X has pdf f(x) = (1/2)** if x =, 1, 2,..., and zero 
otherwise. The MGF of X is 


co 


Myx) = be (1/2yr*? 


x=0 


= (1/2) ¥ (e/ay 


We make use of the well-known identity for the geometric series, 


1 
l-s 


l+tst¢s? 4+ 4--= —-l<s<il 


Theorem 2.5.2 
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with s = e‘/2. The resulting MGF is 
1 
M,(t) = 5 = ay t<iIn2 


The... first... derivative is x(t) =e(2—e)"7, and thus E(X)= M‘(0) 
= e°(2 — e°)"? = 1. It is possible to obtain higher derivatives, but the complex- 
ity increases with the order of the derivative. 


PROPERTIES OF MOMENT GENERATING FUNCTIONS 


If Y = aX + b, then M,(t) = e*M,(at). 
Proof 


M,(t) = E(e") 
eee Eet* + »)) 
_ E eutX, e*) 
ae et E( ey 


= eM y(at) 


One possible application is in computing the rth moment about the mean, 
E([(X — p)']. Because My_ ut). = eM x(t), 


E(X — yp] = = [e-“™M y(t] |,<0 (2.6.4) 


It can be shown that MGFs uniquely determine a distribution. 


Suppose, for example, that X and Y are both integer-valued with the same set of 
possible values—say 0, 1, and 2—and that X and Y have the same MGF, 


2 2 
M(t) = > e?fx(x) = Y ef) 
x=0 y=0 

if we let s=e' and c,=f,(i) —f,(i) for i=0, 1, 2, then we have Cotes 
+c,s*=0 for all s>0. The only possible coefficients are cy = cy = c, = 0, 
which implies that f,(i) = fy(i) for i = 0, 1, 2, and consequently X and Y have the 
same distribution. 

In other words, X and Y cannot have the same MGF but different pdf’s. Thus, 
the form of the MGF determines the form of the pdf. 


This is true in general, although harder to prove in general. 


82 


Theorem 2.5.3 


Yheorem 2.5.4 


CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS 


Uniqueness If X, and X, have respective CDFs F,(x) and F,(x), and MGFs 
M,(t) and M,(2), then F,(x) = F(x) for all real x if and only if M,(t) = M,(¢) for 
all tin some interval —h <t <h for someh> 0, a 


For nonnegative integer-valued random variables, the derivation of moments 


often is made more tractable. by first considering another type of expectation 
known as a factorial moment. 


FACTORIAL MOMENTS 


Definition 2.5.2 


The rth factorial moment of X is 


ELX(X — 1) -::(K-r4+)] (2.5.5) 
and the factorial moment generating function (FMGF) of X is 
Gxt) = E(t*) (2.5.6) 


if this expectation exists for all t in some interval of the formi—h<t<1+h. 


The FMGF is more tractable than the MGF in some problems. 

Also note that the FMGF sometimes is called the probability generating 
function. This is because for nonnegative integer-valued random variables X, 
P[X =r] = GY(0)/r!, which means that the FMGF uniquely determines the dis- 
tribution. Also note the following relationship between the FMGF and MGF: 


Gy(t) = E(t*) = E(e*'"') = My (In 0) 


If X has a FMGF, G,(0), then 


x(L) = E(X) (2.5.7) 
x(1) = E[X(X — 1)] (2.5.8) 
G1) = ELX(X — 1) ++: (X—r +] (2.5.9) 
Proof 
See Exercise 35. ee] 


It is possible to compute regular moments from factorial moments. For 
example, notice that E[X(X — 1)] = E(X? — X) = E(X) — E(X), so that 


E(X?) = E(X) + E[X(X — 1)] (2.5.10) 


Example 2.5.8 
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We consider the discrete distribution of Example 2.5.3. The FMGF of X is 
Gy(t) = Mx(In 2) 


1 
= t<2 
are ere 
Notice that higher detivatives are easily obtained for the FMGF, which was 
not the case for the MGF. In particular, the rth derivative is 


GEO =rlQ=H77 


Consequently, E(X) = G(1) = 112—1)"? =1, and ELX(X — 1)] = Gx(1) 
= 22 —1)7? =2. It follows that E(X?) = E(X)+2= 3, and thus, Var(X) 
= 3-17 = 2. 


SUMMARY 


The purpose of this chapter was to develop a mathematical structure for express- 
ing a probability model for the possible outcomes of an experiment when these 
outcomes cannot be predicted deterministically. A random variable, which is a 
real-valued function defined on a sample space, and the associated probability 
density. function (pdf) provide a reasonable approach ‘to. assigning probabilities 
when the. outcomes of an experiment can be quantified. Random variables often 
can be classified as either discrete or continuous, and the method of assigning 
probability to a real event A involves summing the pdf over values of A in the 
discrete case, and integrating the pdf over the set A in the continuous case. The 
curnulative distribution function (CDF) provides a unified approach for express- 
ing the distribution of probability to the possible values of the random variable. 

The moments are special expected values, which include the mean and variance 
as particular cases, and also provide descriptive measures for other characteristics 
such as skewness of a distribution. 

Bounds for the probabilities of certain types of events can be expressed in 
terms of expected values. An important bound of this sort is given by the Cheby- 
chev inequality. 


EXERCISES 


Let e = (i, j) represent an arbitrary outcome resulting from two rolls of the four-sided die 
of Example 2.1.1. Tabulate the discrete pdf and sketch the graph of the CDF for the 
following random variables: 


(a) Y(e) =it+j. 


8&4 


CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS 


(b) Ze} =i —j. 

() We) =G-i7. 
A game consists of first rolling an ordinary six-sided die once and then tossing an 
unbiased coin once. The score, which consists of adding the number of spots showing on 
the die to the number of heads showing on the coin (0 or 1), is a random variable, say X. 
List the possible values of X and tabulate the values of: 

(a) the discrete pdf. 

(b) the CDF at its points of discontinuity. 

(c) Sketch the graph of the CDF. 

(d) Find P[X > 3}. 

(e) Find the probability that the score is. an odd integer. 


A bag contains three coins, one of which has a head on both sides while the other two 
coins are normal. A coin is chosen at random from the bag and tossed three times. The 
number of heads is a random variable, say X. 
(a) Find the discrete pdf of X. (Hint: Use the Law of Total Probability with B, =a 
normal coin and B, = two-headed coin.) 


(b) Sketch the discrete pdf and the CDF of X. 


A box contains five colored balls, two black and three white. Balls are drawn successively 
without replacement. If X is the number of draws until the last black ball is obtained, find 
the discrete pdf f(x). 


A discrete random variable has pdf.f(x). 
(a) If f(x) = k(1/2)* for x = 1, 2,3; and zero otherwise, find k. 
(b) is a function of the form f(x) = k[(1/2)* — 1/2] for x = 0, 1,2 a pdf for any k? 


Denote by [x] the greatest integer not exceeding x. For the pdf in Example 2.2.1, show 
that the CDF can be represented as F(x) = ([x]/12)? for 0 <x < 13, zero if x <0, and 
one if x > 13. 


A discrete random variable X has a pdf of the form f(x) =c(8.— x). for x = 0, 1, 2, 3, 4, 5, 
and zero otherwise. 
(a) Find the constant c. 
(b) Find the CDF, F(x). 
(c) Find P[X > 2]. 
(d) Find E(X). 
A nonnegative integer-valued random variable X has a CDF of the form 
F(x) = 1 —(1/2)**! for x =0, 1, 2,... and zero if x < 0. 
(a) Find the pdf of X. 
(b) Find P[10 < X < 20]. 
(c) Find P[X is even]. 
Sometimes it is desirable to assign numerical “code” values to experimental responses that 


are not basically of numerical type. For example, in testing the color preferences of 
experimental subjects, suppose that the colors blue, green, and red occur with probabilities 


70. 


77. 


72. 


73. 


714. 


75. 


76. 
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1/4, 1/4, and 1/2, respectively. A different integer value.is assigned to each color, and this 
corresponds to a random variable X that can take on one of these three integer values. 
(a) Can f(x) = (1/4)!*!(1/2)!~*! for x = —1, 1,0 be.used as a pdf for this experiment? 


(b) Can f(x) = jor for x = 0, 1, 2 be used? 
(c) Can f(x) = (1 — x)/4 for x = —1, 0,2 be used? 


Let X be a discrete random variable such that P[X = x] > 0 if x =1,2,3,or 
4, and P[X = x] = 0 otherwise. Suppose the CDF is F(x) = .05x(1 + x) at the values 
x = 1, 2,3, or4. 

(a) Sketch the graph of the CDF. 

(b) Sketch the graph of the discrete pdf f(x). 

(c) Find E(X). 


A player rolls a six-sided die and receives a number of dollars corresponding to the 
number of dots on the face that turns up. What amount should the player pay for rolling 
to make this a “fair” game? 


A continuous random variable X has pdf given by f(x) = c(1 — x)x? if 0<x <1 and 
zero otherwise. 

(a) Find the constant c. 

(b) Find E(x). 


A function f(x) has the following form: 
fd akx Ft) Lex<a 


and zero otherwise. 
(a) For what values of kis f(x) a pdf? 
(b) Find the CDF based on (a). 
(c) For what values of k does E(X) exist? 


Determine whether each of the following functions could be a CDF over the indicated part 
of the domain: 

(a) F(x) =e7*;0<5x< a. 

(b) F(x) =e"; -ao <x <0. 

(c) F(x) =1—e7*; -l<x<o. 


Find the pdf corresponding to each of the following CDFs: 
(a) F(x) = (x? + 2x + b/16; -1 <x <3. 
(b) F(x) = 1—e7** — Axe™*#; O< x < wsA>0, 


If f(x), i= 1,2,...,n, are-pdf’s, show that 


pif{x) is a pdf where p,> 0 and }'p;=1 
=1 


i=1 


86 


17. 


78. 


79. 


20, 


27. 


22. 


CHAPTER 2 RANDOM VARIABLES AND THEIR DISTRIBUTIONS 


A random variable X has a CDF such that 


x/2 0<x<il 


ce hale 1<x<3/2 


Graph F(x). 

Graph the pdf. f(x). 

(c) Find P[X < 1/2]. 

(d) Find P[X > 1/2}. 

(e) Find PLX < 1.25]. 

(f) What is PLX = 1.25}? 


(a 
(b 


A continuous random variable X has a pdf of the form f(x) = 2x/9 for 0< x < 3, and 
zero otherwise. 


(a) Find the CDF of X. 

(b) Find PEX <2]. 

(c) Find P[—1 < X < 1.5}. 

(d) Find a number m such that PLX < m] = P[X > ml. 
(e) Find E(X). 


A random variable X has the pdf 


x? if0<x<1 
f@®=42/33 ifl<x<2 
0 otherwise 


(a) Find the median of X. 
(b) Sketch the graph of the CDF and show the position of the median on the graph. 


A continuous random variable X has CDF given by 


0 ifx <1 
F(x) =42(x -2+1/x) ifl1<x<2 
1 if2<x 


(a) Find the 100 x pth percentile of the distribution with p = 1/3. 
(b) Find the pdf of X. 
Verify that the following function has the four properties of Theorem 2.2.3, and find the 
points of discontinuity, if any: 
0.25e* if —a <x <0 
F(x) = 40.5 fO<cx<t 
l1—e* ifi<x<o 


For the CDF, F(x), of Exercise 21, find a CDF of discrete type, F(x), and a CDF-of 
continuous type, F,(x), and a number 0 < a < 1 such that 


F(x) = aF (x) + (1 — a)F (x) 


23. 


24, 


25. 


26. 


27. 


28. 


29. 
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Let X be a random variable with discrete pdf f(x) = x/8 if x = 1,2, 5, and zero otherwise. 
Find: 


(a) E(X). 
(b) Var(X). 
(c) E(2X + 3). 


Let X be continuous with pdf f(x) = 3x? if 0 <x <1, and zero otherwise. Find: 
(a) E(X). 
(b) Var(X). 
(c) E(x’). 
(d) Find EGX — 5X? + 1). 


Let X be continuous with pdf f(x) = 1/x? if 1 <x < «, and zero otherwise. 
(a) Does E(X) exist? 
(b) Does E(1/X) exist? 
(c) For what values of k does E(X") exist? 


At a computer store, the annual demand for a particular software package is a discrete 
random variable X. The store owner orders four copies of the package at $10 per copy 
and charges customers $35 per copy. At the end of the year the package is obsolete and 
the owner loses the investment on unsold copies. The:pdf-of X is given by the following 
table: 


x 9 7 2 3 4 


fey 143. 3 #2 A 


(a) Find E(X). 

(b) Find Var(X). 

(c). Express the owner’s net profit Y as a linear function of X, and find E(Y) and 
Var(Y). : 


The measured radius of a circle, R, has pdf f(r) = 6r(1 — 1), O<r <1. Find: 
(a) the expected value of the radius. 
(b) the expected circumference. 
(c) the expected area. 


Prove Theorem 2.4.5 for the continuous case. Hint: Use the transformation y = x — pin 
the integral and note that g(y) = yf(u + y) is an odd function of y. 


Consider the discrete random variable X with pdf given by the following table: 


x -3 -1...0 2 2,/2 
f(x) | 1/4 1/4 (6-3,/2)/16 91/8 3. /2/16 - 


The distribution of X is not symmetric. Why? Show that n = 0. 
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Let X be a nonnegative continuous random variable with CDF F(x) and E(X) < oo. Use 
integration by parts to show that 


ie} 
E(X) = i {1 — F(x)] dx 
0 
Note: For any continuous random variable with E(| X |) < 00, this result extends to 


E(X) = — [ ° F(x) dx + { a — F(x)] dx 


m3) 
(a) Use Chebychev’s inequality to obtain a lower bound on P[5/8 < X < 7/8] in 
Exercise 24. Is this a useful bound? 
(b) Rework (a) for the probability P[1/2 << X <1]. 
(c) Compare this bound to the exact probability. 


Consider the random variable X of Example 2.1.1, which represents the largest of two 
numbers that occur on two rolls ofa four-sided die. 


(a) Find the expected value of X. 
(b) Find:the variance of X..:. 


Suppose E(X) =p and. Var(X) = 07. Find the approximate mean and variance of: 
(a) e*. 
(b) 1/X (assuming yz ¥ 0). 
(c) In (X) (assuming X > 0). 


Suppose that X is a random variable with MGF M(t) = (1/8)e‘ + (1/4)e2! + (5/8)e5". 
(a) What is the distribution of X? 
(b) What is P[X = 2]? 


Prove Theorem 2.5.4 for a nonnegative integer-valued random variable X. 


Assume that X is a continuous random variable with pdf 
F(x) = exp [—(x + 2)] if —2 <x < «and zero otherwise. 
(a) Find the moment generating function of X. 
(b) Use the MGF of (a) to find E(X) and E(X?). 
Use the FMGF of Example 2.5.5 to find E[X(X — 1)(X — 2)], and then find £(X°). 


In Exercise 26, suppose instead of ordering four copies of the software package, the store 
owner orders c copies (0 < c < 4). Then the number sold, say S, is the smaller of c or X. 


(a) Express the net profit Y as a linear function of S. 


(b) Find E(Y) for each value of c and indicate the solution c that maximizes the 
expected profit. 


Show that o? = E[X(X — 1)] — n(u — 1). 
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Let W(t) = In [M,(t)], where M,(t) is a MGF. The function w(t) is called the cumulant 
generating function of X, and the value of the rth derivative evaluated at t = 0, x, = W2(0), 
is called the rth cumulant of X. 

(a) Show that zp = W(0). 

(b) Show that o? = (0). 

(c) Use x(t) to find » and o? for the random variable of Exercise 36. 

(d) Use y,(t) to find y and o? for the random variable of Example 2.5.5. 


SPECIAL PROBABILITY 
DISTRIBUTIONS 


3.1 


INTRODUCTION 


Our purpose in this chapter is to develop some special probability distributions. 
In many applications by recognizing certain characteristics, it is possible to deter- 
mine that the distribution has a known special form. Typically, a special distribu- 
tion will depend on one or more parameters, and once the numerical value of 
each parameter has been ascertained, the distribution is completely determined. 

Special discrete distributions will be derived using the counting techniques of 
Chapter 1. Special continuous distributions also will be presented, and relation- 
ships between various special distributions will be discussed. 
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SPECIAL DISCRETE DISTRIBUTIONS 


a 
| Example 3.2.7 


We will use the counting techniques of Chapter 1 to derive special discrete dis- 
tributions. 


BERNOULLI DISTRIBUTION 


On a single trial of an experiment, suppose that there are only two events of 
interest, say E and its complement E’. For example, E and E’ could represent the 
occurrence of.a “head” or-a “tail” on a single coin toss, obtaining a “defective” or 
a “good” item when. drawing a single item from a manufactured lot, or, in 
general, “success” or “failure” on a particular trial of an experiment. Suppose that 
E.occurs with probability. p = P(E), and consequently E’ occurs with probability 
qg= P(E) =1— p. 

A.tandom variable, X,.that assumes only the values.0-or 1 is known as a 
Bernoulli variable, and a performance of an experiment with only two types of 
outcomes is called a Bernoulli trial. In particular, if an experiment can result only 
in “success” (E) or “failure” (E’), then the corresponding Bernoulli variable is 


1 ifeekE 
X(e) = fF feck (3.2.1) 
The pdf of X is given by.{(0) = g and f(1) = p. The corresponding distribution is 
known as.a Bernoulli distribution, and its pdf can be expressed as 


fw=pqg* -'7~=0,1 (3.2.2) 


In Example 2.1.1, we considered rolls of a four-sided die. A bet is placed that a 1 
will occur on a single roll of the die. Thus, E = {1}, E’ = {2, 3, 4}, and p = 1/4. 

In an earlier example, we considered drawing marbles at random from a collec- 
tion of 10 black and.20 white marbles. In such a problem, we might regard 
“black” as success and “white”. as failure, or vice versa, ina single draw. If obtain- 
ing a black marble is regarded as a success, then p= 10/30 = 1/3 and g 
= 20/30 = 2/3. 


Notice that E(X)=0-q+1-p=p and E(X?)=07-q+17- p=p, so that 
Var(X) = p — p? = p(l — p) = pa. 

An important distribution arises from counting the number of successes on a 
fixed number of independent Bernoulli trials. 
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BINOMIAL DISTRIBUTION 


Often it is possible to structure a more complicated experiment as a sequence of 
independent Bernoulli trials, where the quantity of interest is the number of suc- 
cesses on a certain number of trials. 


Sampling with Replacement In Example 1.6.16, we considered the problem of 
drawing five marbles from a collection of 10 black and 20 white marbles, where 
the marbles are drawn one at a time, and each time replaced before the next 
draw. We shall let X be the number of black marbles drawn, and consider 
f(2) =-P[X = 2]. To. draw. exactly two. black (B), and consequently three white 
(W), it would be necessary to. obtain some permutation of two B’s and three W’s, 
BBWWW, BWBWW, and so. on (see Figure 1.12 for.a complete listing). There 


5 
are ( ) =.10 possible permutations of this type, and each one has the same prob- 


ability of occurrence, namely (10/30)?(20/30), which is the product of two values 
of P(B) = 10/30 and three values of P(W) = 20/30, multiplied together in some 
order. The probability can be obtained this way because draws made with 
replacement can be regarded as independent Bernoulli trials. Thus, 


5\/10\?/20\3 
ve=(2)(35) () 
which agrees with the solution (1.6.11). 


This approach can be used to derive the more general ‘binomial distribution. In 
a sequence of n independent Bernoulli trials with probability of success p on each 
trial, let X represent the number of successes. The discrete pdf of X is given by 


/ 
n 
b(x; n, p) = ("pre x=0,1,...,7 (3.2.3) 
For the event [X = x] to occur, it.is necessary to have some permutation of x 
‘ n : 
successes (£) and n — x failures (E’). There are ( ) such permutations, and each 
x 


occurs with probability p*g"~*, which is the product of x values of p = P(E) and 
n—x values of q = P(E’). Of course, the order of multiplication is unimportant, 
and formula (3.2.3), which is known as the pdf of the binomial distribution, is 
established. The notation b(x; n, p), which we have used instead of f(x), reflects 
the dependence on the parameters n and p. 

The general properties (2.2.2) and (2.2.3) are satisfied by equation (3.2.3), 
because 0 < p <1 and 


n n n 
Y bx; n, p) = ¥ ("pre =(p+q)"=1"=1 (3.2.4) 
x=0 


x=0 
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The CDF ofa binomial distribution is given at integer values by 
B(x; n, p) =.) b(k; n, p) x =0, Lye..sn (3.2.5) 
k=0 


Some values of B(x; n, p) are provided in Table 1.in Appendix C for various 
values of n and p. The following identity is easily verified: 


B(x; n, p) =1— Bin —x—1;n, 1 —p) (3.2.6) 
Values of the pdf can be obtained easily from Table 1 because 
B(x; n, p) = B(x; n, p) — B(x — 1; n, p) (3.2.7) 


A short notation to designate that X has the: binomial distribution with 
patameters n and pis X ~ B(x; n, p) or an alternative notation 


X ~ BIN(n, p) (3.2.8) 


The binomial distribution arises in connection with many games of chance, 
such as rolling dice.or tossing coins. 


A coin is tossed independently n times. Denote by p = P(H) the probability of 
obtaining a head on a single toss. If p= 1/2 we say that the coin is fair or 
unbiased; otherwise it is said to be biased. For example, if X is the number of 
heads obtained by tossing an unbiased coin 20 times, then X ~ BIN(20, 1/2). 
There is a connection between this example and Example 1.6.14, which dealt with 
randomly choosing the answers to a 20-question true—false test. If the questions 
were answered according to the results of 20-coin tosses, then the distribution of 
the number of correct answers also would be BIN(20, 1/2). Thus, the probability 
of exactly 80% of the answers being correct would be 


1 20\/:1\'°/1\4 
16: eae Rae Si =) =0. 
o( a ;) (70) (5) anoaS 


which also was obtained in Example 1.6.14 by using counting techniques and the 
classical. approach. In. applications where p 4.1/2, the classical approach no 
longer. works because the permutations. of successes. and failures are not all 
equally likely, although the binomial distribution still applies. 

For example, suppose that, instead of a true-false test, a 20-question multiple- 
choice test with four choices per question is answered at random. This could be 
carried out by rolling the four-sided die of Example 3.2.1 20 times, and the dis- 
tribution of the number of correct answers would be the same as the distribution 
of occurrences of any particular one of the values 1, 2, 3, or 4. The appropriate 
distribution of either variable would be BIN(20, 1/4). In this case, the probability 
of exactly 80% correct is 


1 2O\f 1\t8 73 \" 
(16, 20, ;) = (783) (3) = 0.00000036 
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Actually, even the probability of 50% correct is rather small, namely 


1 20\/1\'°/3\%° 
( 10; 20, i) = ( i NG) (3) = 0.0099 
The probability of at most 50% correct is B(10; 20, 1/4) = 0.9960. 


The binomial distribution also is useful in evaluating certain games of chance. 


A game of chance consists of rolling three ordinary six-sided dice. The player bets 
$1 per game, and wins $1 for each occurrence of the number 6 on any of the dice, 
retaining the original bet in that case. Thus, the net amount won would be a 
discrete random variable, say. Y, with possible values 1, 2, 3, or —1, where the 
latter. value corresponds.to the dollar bet, which would be.a net loss if no die 
shows a 6. 

One possible approach would be to work out the distribution of Y and then 
compute E(Y). directly. Instead. we. will use-the fact that. Y is a function of a 
binomial random variable, X, which is the: numberof :6’s-on the three dice. In 
patticular, X ~ BIN(3, 1/6) and Y = u(X), where u(x) is given by 


x08 8g3 
ux) 1° 2 23 


It follows that 
E(Y) = E[W(X)] 
3 3 1\* 5 3-x 
Zw NG) @) 
125 75 15 1 
= ~1(3%8) + (i) + 2a) + as) 
17 


= ~ 16 —0.08 


Thus, the expected amount won is actually an expected loss. In other words, if 
the player bets $1 on each play, in the long run he or she should expect to lose 
roughly $8 for every 100 plays. 


Now we will derive some general properties of the binomial distribution. 
If X ~ BIN(n, p), then : 


Mx) = Se ")prar= 


x=0 
= n 1\X nm x 
~ 2, ("oe 
= (pe + g)" -~am<t<o (3.2.9) 


from the binomial expansion. 
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We note that M}(t) = n(pe' + q)"~*pe', and so M4(0) = np. 

It is also possible to derive the variance by first evaluating E(X?) = M%(0). 
However, we. will use. this opportunity to illustrate the use of the factorial 
moment generating function, or FMGF, G,(t). Specifically, if ¥ ~ BIN(n, p), then 


Gx(t) = E(t”) = (pra 


y ("\oore “x 


x=0 


(pt + 4)" 
x(t) = n(pt + q)"~*p 
Gx(t) = (n — In(pt + g)"~*p? 
Thus, E(X)=G(1)=np and E[LX(X — 1)] = GYi)=(n— np’, so that 
E(X?) = np + (n — 1)np? = np + (np)? — np? and Var(X) = np + (np)? — np? 


— (np)? = np(1 — p) = npq. 
The results on the mean, variance, and MGF of the binomial distribution are 
summarized in Appendix B. 


HYPERGEOMETRIC DISTRIBUTION 


In Example 1.6.15, we-found. the probability of obtaining exactly two black 
_ mnarbles out of five selected at random without replacement from a collection of 
10 black and 20 white marbles. This type of problem can be generalized to obtain 
an important special discrete distribution known as the hypergeometric distribu- 


~ tion. 


Suppose a population or collection consists of a finite number of items, say N, 
and there are M. items of type 1-and the remaining N — M items are of type 2. 
Suppose n items are drawn at random without replacement, and denote by X the 
number of items of type 1 that are drawn. The discrete pdf of X is given by 


M\(N — i) 
x/\ n—x 
N 
n 
The underlying sample space is taken to be the collection of all subsets of size n, 


N M —M 
of which there are C) and there are (* ee : ) outcomes that correspond 


to the event [X = x]. Equation (3.2.10), which is the pdf of the hypergeometric 
distribution, follows by the classical method of assigning probabilities. The nota- 
tion h(x; n, M, N), which is used here instead of f(x), reflects the dependence on 
the parameters n, M, and N. The required properties (2.2.2) and (2.2.3) are clearly 


h(x; n, M, N) = (3.2.10) 
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: M _ 
satisfied because tele M) counts the number of subsets of size n with 


exactly x items of type 1, and thus the total number of subsets of size n can be 


represented either by (*) orby )° ee - 5 It follows that 


=0 \x 


> h(x; n, M,N) =1 


x=0 


In Example 1.6.15, the parameter values are n = 5, N = 30, and M = 10, where 
black marbles are regarded as type 1. Suppose the number of marbles selected is 
increased to n = 25, and the probability of obtaining exactly eight black ones is 
desired. This is 


ee) 
8 /\17 
h(8; 25, 10, 30) = > 
( ) 30 
25 
Notice that the possible values of X in this case are x = 5, 6, 7, 8, 9, or 10. This is 
because the selected subset cannot have more than M = 10 black marbles or 


more than N — M = 20 white marbles. In general, the possible values of X in 
(3.2.10) are 


max (0,n—N.+M) <x <min (n, M) (3.2.11) 


and A(x; n, M,N) is zero otherwise. 
The hypergeometric distribution is important in applications such as deciding 
whether to accept a lot of manufactured items. 


Recall Example 1.5.1, in which a box contained 100 microchips, 80 good and 20 
defective. The number of defectives in the box is unknown to a purchaser, who 
decides to:select 10 microchips at random without replacement and to consider 
the microchips in the box acceptable if the 10 items selected include no more than 
three defectives. The number of defectives selected, X, has the hypergeometric 
distribution with n = 10, N = 100, and M = 20 (according to Table 1.1), and the 
probability of the lot being acceptable is 


el na 

3 \x /\10 — x 

P[X <3] = Y ~S——*=0.890 

[x <3] 2 & 
10 


Thus, the probability of accepting a lot with 20% defective items is fairly high. 
Suppose, on the other hand, that the box contained 50:good and 50 defective 


Theorem 3.2.7 


3.2 SPECIAL DISCRETE DISTRIBUTIONS 97 


items, The same acceptance criterion would yield 


(°\ 50 ) 
3 — 
AXIO = X/-3. 95459 


P[X <3]= 2 100 
(10) 


which means, as we might expect, that a lot with a higher percentage of defective 
items is less-likely to be accepted. 


In the preceding example, probabilities of the form P[X < x] were important. 
This is the CDF of X, and we will adopt a special notation for the CDF of a 
hypergeometric distribution, namely 


H(x;n, M, N) = Y h(i; n, M, N) (3.2.12) 
i=0 


Any hypergeometric probability of interest can be expressed in terms of equa- 
tion (3.2.12). For example, consider the sampling problem of Example. 1.6.15, 
where X is the number of black marbles in a sample of size five. It follows that 

PLX <2] = H(2; 5, 10, 30) 
PLX = 2] = P[LX < 2] — P[X < 1] = H(2; 5, 10, 30) — H(1; 5, 10, 30) 
P[X > 3] =1—P[X <3] =1—-_ H(3; 5, 10,30) 
PLX >3]=1-—PLX <2] =1—H(2;5, 10, 30) 
and so on. 

A short notation to designate that X has the hypergeometric distribution with 

parameters n, M, and N is 
X ~ HYP(n, M, N) (3.2.13) 

It can be shown by a straightforward but rather tedious derivation that 
E(X)=nM/N and Var(X) =n(M/N\1 — M/N\XN —n)/(N — 1), but we will 
postpone further discussion of this point until Chapter 5, where these quantities 


will be obtained. 
The main properties of the hypergeometric distribution are summarized in 


Appendix B. 
Under certain conditions, the binomial distribution can be used to approx- 
imate the hypergeometric distribution. 


If X ~ HYP(n, M, N), then for each value x = 0, 1, ..., n, and as N > co and 
M > co with M/N = p, a positive constant, 


Ce 
tim = Ge = ("ora Se ie (3.2.14) 
n 
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Proof 


The proof is based on rearranging the factorials in equation (3.2.10) to obtain an 
F n ‘ ; 
expression of the form (") times a product. of ratios that converges to 


p*(1 — p)""* as M> oo. i] 


This provides an approximation when the number selected, n, is small rela- 
tive to the size of the collection, N, and the number of items of type 1, M. This is 
intuitively. reasonable. because the binomial distribution is applicable when we 
sample with replacement, while the hypergeometric distribution is applicable 
when we sample without replacement. If the.size of the collection sampled from is 
large, then it should not make a great deal of difference whether a particular item 
is returned to the collection before the next one is selected. 


Ten seeds are selected from a bin that contains 1000. flower seeds, of which 400 
are red flowering seeds, and the rest are of other colors. How likely is it to obtain 
exactly five red flowering seeds? Strictly speaking, this is hypergeometric and 
h(5; 10, 100, 1000) = 0.2013. The binomial approximation is 6(5;10, 0.4) 
= 0.2007. 

If we work this directly, with the method of conditional probability, it provides 
some additional insight into Theorem 3.2.1. Suppose we draw the 10 seeds, one at 
a time, without replacement from the bin. To obtain five red flowering seeds, it is 
necessary to obtain some permutation of five red and five other, of which there 


10 es 
are ( 4 possible. Each one would have the same probability of occurrence, 


namely 


400.398, 828 + 600 5599... 398 
1000 999 996 995 994 991 


which is close to (0.4)°(0.6)°. 


There are other special discrete distributions that are based on independent 
Bernoulli trials. For example, suppose a certain satellite launch has a 0.7 prob- 
ability of success. The probability remains constant at 0.7 for repeated launches, 
and the success or failure of one launch does not depend on the outcome of other 
launches. It is feared that funding for the satellite program will be cut off if 
success is not achieved within three trials. What is the probability that the first 
success will occur within three trials? The probability of success occurring on the 
first trial is p, = 0.7, the probability of the first success occurring on the second 
trial is p, = 0.3(0.7), and the probability of the first success occurring on the third 
trial is (0.3)?0.7. The- probability of first success within three trials is 
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0.7 + 0.3(0.7) + (0.3)?0.7 = 0.973. General solutions to this type of problem lead 
to the geometric and negative binomial distributions. 


GEOMETRIC AND NEGATIVE BINOMIAL DISTRIBUTIONS 


We again considera sequence of independent Bernoulli trials with probability of 
success p = P(E). In the case of the binomial distribution, the number of trials 
was a fixed number n, and the variable of interest was the number of successes. 
Now we consider the number of trials required to achieve a specified number of 
SUCCESSES. 

If we denote the number of trials required to obtain the first success by X, then 
the discrete pdf of X is given by 


g(x; p)= pg** x = I, 2, 3,... (3.2.15) 


For the event [X = x] to occur, it is necessary to have a particular permu- 
tation consisting of x — 1 failures followed by a success. Because the trials are 
independent, this probability is the product of p with x — 1 factors of q = 1 —p, 
as given by equation (3.2.15). 

The: general properties (2.2.2) and (2.2.3) are satisfied by equation (3.2.15), 
because.0 < p< 1 and 


Yoosp=pyV. gut =pil+qt+q+--) 
x=i x=1 


: 
C (+) ae (3.2.16) 
—4q Dp 


The distribution of X is known as the geometric distribution, which gets its name 
from its relationship with.the geometric series that was used to evaluate equa- 


tion (3.2.16). This also sometimes is known as the Pascal distribution. We will use 
special notation that-designates that X has pdf (3.2.15): 


X ~ GEO(p) (3.2.17) 
It also follows from the properties of the geometric series that the CDF of X is 


Gx; p)= Vpgt=1-q  x=1,2,3,... (3.2.18) 
i=1 : ? 


The probability a certain baseball player gets a hit is 0.3, and we assume that 
times at bat are independent. The probability that he will require five times at bat 
to get his first hit is g(5; 0.3) = 0.7*(0.3). Given that he has been at bat 10 times 
without a hit, the probability is still 0.74(0.3) that it will require five more times at 
bat for him to get his first hit. Also, the probability that five or fewer at bats are 
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required to obtain the first hit is given by 
| G(5; 0.3) = 1 — (0.7)° = 0.83193 
The geometric distribution is the only discrete probability distribution that has 
a so-called no-memory property. 


Theorem 3.2.2. No-Memory Property If X ~ GEO(p), then 
: PIX >j+k|X >j] = P[X >k] 
Proof 
P[X >j +k] 
P[X >j] 
(dee pit 
(1 — p) 
= (1 — py 
= P[X>k] 


PIX >jtk|X>jl= 


Thus, knowing that j trials have passed without. a success does not affect the 
probability of k more trials being required to obtain a success. That is, having 
several failures in a row does not mean that you are more “due” for a success. 

Example 1.2.3 involved tossing a coin until the first head occurs. If X is the 
number of tosses, and if p= P(H), then X ~GEO(p). It was noted that an 
outcome corresponding to never obtaining a head was unnecessary. Of course, 
this is because the probability of never obtaining a head is zero. Specifically, if A 
represents the event “a head is never obtained,” then A’ is the event “at least one 


head,” and 
P(A) = 1 — P(A) =1— PY g(x; p)=1-1=0 
x=1 
The mean of X ~ GEO(p) is obtained as follows: 
E(X) = ¥ xpq*~? 
x=1 
Oo. ad 
i a? dq q 
ae. 
dq 2 
ee 
= p(t ~ q)~? 
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By a similar argument, E(X?) = (1+ q)/p? and, consequently, Var(X) = /p’. 
The properties of a geometric distribution are summarized in Appendix B. 

It should be noted that some authors consider a slightly different variable, Y, 
defined as the number of failures that occur before the first success. Thus, 
Y = X — 1, and 


PLY = y) =(1— pp y= 0,1,.2;... (3.2.19) 


This probability distribution also sometimes is referred to as a geometric dis- 
tribution. 


NEGATIVE BINOMIAL DISTRIBUTION 


In repeated independent Bernoulli trials, let X denote the number of trials 
required to obtain r successes. Then the probability distribution of X is the nega- 
tive binomial distribution with discrete pdf given by 


—1 
fin p= (* tre x=rnrtl,... (3.2.20) 
re 


For the event [X = x] to occur, one must obtain the. rth success on the xth 
trial by obtaining “r — 1 successes in the first x — 1 trials” in any order, and then 
obtaining a “success on the xth trial.” Thus, the. probability. of the first event may 
be expressed as 


x—1 F 
rr 1 _ p\le-1-¢-)) 
(* = tp (1 — p) 


- and multiplying this by p, the probability of the second event, produces equation 


(3.2.20). 
A special notation, which designates that X has the negative binomial distribu- 


tion (3.2.20), is 
X ~ NB(, p) (3.2.21) 


Team A plays team B in a seven-game world series. That is, the series is over 
when either team wins four games. For each game, P(A wins) = 0.6, and the 
games are assumed independent. What is the probability that the series will end 
in exactly six games? We have x = 6, r = 4, and p = 0.6 in equation (3.2.20), and 


P(A wins series in 6) = f(6; 4, 0.6) 


= (3 Joona 


= 0.20736 
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P(B wins. series in 6) = {(6; 4,-0.4) 


5 2 
= ( Joao 


= 0.09216 
P(series goes 6 games) = 0.20736 + 0.09216 
= 0.29952 


The general properties (2.2.2) and (2.2.3) are satisfied by equation (3.2.20), 
because 0 < p < i and 


2 fi —1)\, 
Note that > ( : i 1 )e is the series expansion of (1 — q)~" as given in any 
i=0 re 


standard mathematical handbook. The name “negative binomial” distribution 
results from its relationship to this binomial series expansion with negative expo- 
nent, —r, which was used to establish equation (3.2.22). 

Again, some authors consider the alternate variable Y, which is defined to be 
the number of failures that occur prior to obtaining the rth success. That is, 
X= Y+rand 


+r—1 
oe ora - py 


=fy+r;r,p)  y=0,1,2,... (3.2.23) 


Sys", p) = PLY =y] = ( 


in Example 3.2.8, the terminology now. would become 


P(A wins series in 6) = P(2 losses occur before 4 wins) 


= PLY = 2] 


5 4 2 
= ( ;)e.6 (0.4) 


as before. 
It can be shown that if X ~ NB(r, p), then E(X) =r/p, Var(X) = rq/p?, and 
M,(t) = [pe’1 — ge')]’. These properties are left as an exercise (see Exercise 19). 
A summary of the properties of the geometric and negative binomial distribu- 
tions is given in Appendix B. 
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BINOMIAL RELATIONSHIP TO NEGATIVE BINOMIAL 


The negative binomial problem sometimes is referred to as inverse binomial sam- 
pling. Suppose X ~ NB(r, p) and W ~ BIN(n, p). It follows that 


PEX <n] =P[W er] (3.2.24) 


That is, W >r corresponds to the event of having r or more successes in n 
trials, and that means n or fewer trials will be needed to obtain the first r suc- 
cesses. Clearly, the negative binomial distribution can be expressed in terms of the 
binomial CDF by the relationship 


F(x; r, p) = P[X <x] =1-— Bir —1; x, p) 
= B(x —1r; x, q) (3.2.25) 


If.in- Example 3.2.8.we.are interested in the probability that team A wins the 
world series in six or fewer games, then x = 6,r = 4, p = 0.6, and 


P(A wins series in 6 or less) = PLX < 6] 
= F(6; 4, 0.6) 


2 y (“5 ‘Jooroar-* 


x=4 


= B(2; 6, 0.4) 


2/6 
=} (S)oaroos 


w=0 


= 0.5443 


That is, the probability of winning the series in six or fewer games is equivalent 
to the-probability of suffering two or fewer losses.in six games. The main advan- 
tage of writing the negative binomial CDF in terms of a binomial CDF is that 
the binomial CDF is tabulated much more extensively in the literature. Also, 
known approximations to the binomial can be applied. 

We will consider one more discrete distribution, which cannot be obtained 
directly by counting arguments, but which can be derived as a limiting form of 
the binomial distribution. 


POISSON DISTRIBUTION 


A discrete random variable X is said to have the Poisson distribution with param- 
eter u > O if it has discrete pdf of the form 
ae 
{x M= —— x =0,1,2,... (3.2.26) 
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A special notation that designates that a random variable X has the Poisson 
distribution with parameter y is 


X ~ POIw) (3.2.27) 


Properties (2.2.2) and (2.2.3). clearly. are satisfied, because » >0 implies 
f(x; 20 and 


S fees ses y & =e tel= | (3.2.28) 
The CDF of X ~ POI(1), denoted: by 

F(x; 4) = yy (k; w) (3.2.29) 
cannot be expressed in a simpler functional form, but it can be tabulated. Values 


of F(x; #) are provided in Table 2 (Appendix C) for various values of x and y. 
If X ~ POI(p), then 


a we 
M,(t)= ¥ ee74# — 
x=0 x! 
wens Wer 
x=0 x! I 
=e Meret 
Thus, 
M,({t) = eto) —-a<t<a (3.2.30) 


It follows that M4(t) =e“ Yuet = My(tue', and thus M%(0) = M,(O)ue® = pw. 
Similarly, Mj()=([M,(t)+ My(Qlwe', so that Mi(0)=(14+ yu and, thus 
Var(X) = E(X*)—~wW=nlt+y-—w =u. 

A summary of properties of the Poisson distribution is given in Appendix B. 

It is possible to derive equation (3.2.26) as a limiting form of the binomial pdf if 
n—> oo and p— 0 with np = py constant. 


if X .~ BIN(, p), then for each value x =.0, 1, 2, ..., and as p> 0 with np=yp 
constant, 

: n e Hux 

] *(1 — pp)! *= 2.31 
im (“)p (1 — p) x! (3 ) 


n~ 00 


n\ ead en TN go 
(") ae Aaa me (‘) (: 


Proof 
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The result follows by taking limits on both sides of the equation, and recalling 
from calculus that 


lim (1 ~ “) apr 
no n 


and, for fixed x, 


One consequence of Theorem 3.2.3 is that f(x; np) provides an approximation 
to the more complicated b(x; n, p) when n is large and p is small. 


Suppose that 1% of all transistors produced by a certain company are defective. 
A new model of computer. requires.100 of these transistors, and 100 are selected 
at random from the company’s assembly line. The exact probability of obtaining 
a specified number of defectives, say 3, is b(3; 100, 0.01) = 0.0610, whereas the 
Poisson approximation is f(3; 1) = 0.0613. 


As a general rule, the approximation gives reasonable results provided 
n> 100 and p < 0.01, and when x is close to np. 

Theorem 3.2.3 also-gives some insight into the types of problems for which the 
Poisson distribution provides an adequate probability model: When the variable 
of interest results, at least approximately, from a large number of independent 
Bernoulli-type experiments, each with a smal{ probability of success. For 
example, the number of accidents in a year at a particular intersection may be 


’ approximately a Poisson variable, because.a large number of vehicles may pass 


the intersection in a year and the probability of any one vehicle having an acci- 
dent is small; In this case, the parameter wz would be directly affected by the 
number of vehicles per year and the degree of risk at the intersection. 


POISSON PROCESSES 


Consider a physical situation in which a certain type of event is recurring, such as 
telephone calls or defects in a long piece of wire. Let X(t) denote the number of 
such events that occur in a given interval [0, t], and suppose that the following 
assumptions hold. First, the probability that an event will occur in a given short 
interval [t, ¢ + At] is approximately proportional to the length of the interval, At, 
and does not depend on the position of the interval. Furthermore, suppose that 
the occurrences of events in nonoverlapping intervals are independent, and the 
probability of two or more events in a short interval [t, t + At] is negligible. If 
these assumptions become valid as At — 0, then the distribution of X(t) will be 
Poisson. The assumptions and conclusions are stated mathematically in the fol- 
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lowing. theorem. Note that o(At) denotes..a function. of At such that 


lim o(Az)/At = 0. In other words, o(At) is negligible relative to At. 
At70 


Theorem 3.2.4 Womogeneous Poisson Process Let X(t) denote the number of occurrences in the 
interval [0, t], and P,(t) = P[n occurrences in an interval [0, t]]. Consider the 
following properties: 

tL. X(0) = 0, 
2. P[X(t + h) — X(t) = n| X(s) = m] = P[X(t +h) —- X(t) = 1] for all 
O<s<tand0<h. 
3. PEX(t + At) — X(t).= 1] = AAt +.o(At) for some constant A > 0, and 
4. P[X(t + At) — X(t) > 2] = o(Ad). 
If properties 1 through 4 hold, then for all t > 0, 


P,{t) = P[X() =n] =e *(Ad"/n! (3.2.32) 


Proof 


Now n events may occur in the interval [0, t+ At] by having 0 events in 
[t, t+ At] and n events in [0, t], or one event.in [t, t + At] and n — 1 events in 
[0, ¢], or two or more events in [t, t + At]; thus for n > 0; 


P,(t + At) = P,_,(0P;(Ad) + P,(t)P (At) + o(Ad) 
= P,_,(O)(AAt + o(Ad)] + P,()[1 — At — o(Ad)] + o(Ad) 


but 
OP AD. Pt At) PD 
di aun At 
. Py_-s(taAt + P,(t) — P,(t)AAt — P,(t) 
= lim 
At—+0 At 

= ALP, - x(t) <2 P,()] 

For n = 0, 


P(t + At) = Po(t)Po(At) 
= Po(t)[1 — AAt — o(At)] 
dP,(t) ri P(t + At) — P(t) 


dt AesO At 
in —AAtP (t) — o(At)P,(t) 
At70 At 


Il 


—AP f(t) 
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Assuming the initial condition P(0) = 1, the solution to the above differential 
equation is verified easily to be 


P(t = e=4# 
Similarly, letting n = 1, 


dP 
FO _ 1(P,() — P01 


= Afe~* — P,(t)] 
which gives 
P,(t) = Ate~** 
It can be shown by induction that 


Pt)=e *Uo"/n! n=0,1,2,... 


Thus, X(t) ~ POI(A1), where w = E[X(t)] = At. The proportionality constant A 
reflects. the rate of occurrence or intensity of the Poisson process. Because A is 
assumed constant over i, the process is referred to as a homogeneous Poisson 
process (HPP). Because A is constant and the increments are independent, it turns 
out that one does not need to be concerned about the location of the interval 
under question, and the model X ~ POI(u) is applicable for any interval of 
length ¢, [s, s+], with w = At. The constant / is the rate of occurrence per unit 
length, and the interval is t units long. 

Finally, we will consider a rather simple type of discrete distribution known as 
~ the discrete uniform distribution. 


DISCRETE UNIFORM DISTRIBUTION 


Many problems, essentially those involving classical assignment of probability, 
can be modeled by a discrete random variable that assumes all of its values with 
the same probability. It usually is possible to relate such problems to a set of 
consecutive integers 1, 2,..., N. 

A discrete random variable X has the discrete uniform distribution on the inte- 
gers 1, 2,..., N if it has a pdf of the form 


f= x x=1,2,...,N (3.2.33) 


A special notation for this situation is 


X ~ DU(N) (3.2.34) 
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Games of chance such as lotteries or rolling unbiased dice obviously are modeled 
by equation (3.2.33). For example, the number obtained by rolling an ordinary 
six-sided die would correspond to DU(6). Of course, this assumes that the die is 
not loaded or biased in some way in favor of certain numbers. 

Another example that we have considered is the multiple-choice test of 
Example 3.2.3. If, on any question, we associate the four choices with the integers 
1, 2, 3, and 4, then the response, .X, on any given question that is answered at 
random is DU(4). This also models the outcome, X, of rolling a four-sided die. 
The discrete pdf and CDF of X ~ DU(4) are given in Figure 3.1. 


Discrete pdf and CDF of number obtained on a single roll of a four-sided die 


S(”) FQ) 


Comparing Figure 3.1 with Figures. 2.2 and 2.3, the distribution of the 
maximum over two rolls favors the larger values, whereas the result of a single 
roll does not. This should not be a surprise because there are more ways to 
achieve the larger values based on the maximum. 

The mean is obtained as follows: 


= (1/N)N(N + 1) 
2 


N+1 
2 


Similarly, E(X?) = (N + 1)(2N + 1)/6 and Var(X) = (N? — 1/12. 


3.3 
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SPECIAL CONTINUOUS DISTRIBUTIONS 


We will now discuss several special continuous distributions. 


UNIFORM DISTRIBUTION 


Suppose that a continuous random variable X can assume values only in a 
bounded interval, say the open interval (a, b), and suppose that the pdf is con- 
stant, say f(x)=c over the interval. Property (2.3.5) implies c = 1/(b — a), 
because 1 = [2c dx = c(b — a). If we define f(x) =0 outside the interval, then 


property (2.3.4) also is satisfied. 
This special distribution is known as the uniform distribution on the interval 


(a, b). The pdf is 


i 
F(x; 4, Sa a<x<b (3.3.1) 


and zero otherwise. A notation that designates that X has pdf of the form (3.3.1) 
is 


X ~ UNIF(a, b) (3.3.2) 
This is the continuous counterpart of the discrete uniform distribution, which 


was discussed in Section 2.2. It provides a probability model for selecting a point 
“at random” from an interval (a, b). A more specific example is given by the 


. random waiting times for the bus passenger in Example 2.3.1. As noted earlier, it 


does not matter whether we include the endpoints a = 0 and b = 5S. 

Perhaps a more important application occurs in the case of computer simula- 
tion, which relies on the generation of “random numbers.” Random number gen- 
erators are functions in the computer language, or in some cases subroutines in 
programs, which are designed to produce numbers that behave as if they were 
data from UNIF(0, 1). 

The CDF of X ~ UNIF(a, b) has the form 


0 x<a 
x—a : 

F(x; a, b) = Re a<x<b (3.3.3) 
1 b<x 


The general form of the graphs of f (x; a, b) and F(x; a, b) can be seen in Figure 
2.5, where, in general, the endpoints would be a and _b, rather than 0 and 5. 
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If X ~ UNIF(a, b), then 


, 1 
E(X) = { ; == -) dx 


i DP a? 
~ 2(b — a) 
_ (b+ a(b —a) 
~ 2b — a) 
atb 
2 


Furthermore, 


. 1 
E(X?) = | “(; = ;) dx 


b? — a3 
~ 3(b — a) 
_ (b? + ab + a’\(b — a) 
~ 3(b — a) 
2. b? + ab + a? 
iy 3 


Thus, 

b?+ab+a?_ (a+b) 
3 4 

“(b —ay 

re AD 


Var(X) = 


In this case, we can conclude that the mean of the distribution is the midpoint, 
and the variance is proportional to the square of the length of the interval (a, b). 
This is consistent with our interpretations of mean and variance as respective 
measures of the “center” and “variability” in a population. 

For example, the temperature reading (in Fahrenheit degrees) at a randomly 
selected time at some location is a random variable X ~ UNIF(50, 90), and the 
reading at a second location is a random variable Y ~ UNIF(30, 110). The 
means are the same, fy = Hy = 70, but the variances are different, o% = 400/3 
< of = 1600/3. 

The 100 x pth percentile, which is obtained by equating the right side of equa- 
tion (3.3.3) to p and solving for x, is x, = a + (b — a)p. 

The main properties of a uniform distribution are summarized in Appendix B. 
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GAMMA DISTRIBUTION 


A continuous distribution that occurs frequently in applications is called the 
gamma distribution. The name results from its relationship to a function called 
the gamma function. 


SE ss 
Definition 3.3.7 


The gamma function, denoted by I(x) for all x > 0, is given by 


T(x) =( tte! dt (3.3.4) 
10 


For example, if x = 1, then (1) =ffe7' dt = 1. The gamma function has 
several useful properties, as stated in the following theorem. 


Theorem 3.3.1 The gamma function satisfies the following properties: 
T(x) = (k — U)P(« — 1) K>1 (3.3.5) 
T@) =(#— 1)! nisl, 2p (3.3.6) 


r(5) =/n (3.3.7) 


Proof — 


- See Exercise 35. 


A continuous random variable X is said to have the gamma distribution with 
parameters x > 0 and 6 > 0 ifit has pdf of the form 


xg x/8 x>0 (3.3.8) 


Sees 8.) = Br 
and zero otherwise. The function given by equation (3.3.8) satisfies the general 
properties (2.3.4) and (2.3.5), with the latter resulting from the substitution t = x/@ 
in the integral {? f(x;0, x) dx, resulting in P(«)/T(x) = 1. 

A special notation, which designates that X has pdf given by equation (3.3.8), is 


X ~ GAM(6, k) (3.3.9) 


The parameter x is called a shape parameter because it determines the basic 
shape of the graph of the pdf. Specifically, there are three basic shapes, depending 
on whether x <1, «x = 1, or k > 1. This is illustrated in Figure 3.2, which shows 
the graphs of equation (3.3.8) for « = 0.5, 1, and 2. 
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FIGURE 3.2 
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The pdf's of gamma distributions 


1/6 f J (x;6,0.5) 


J (x38, 1) 
- f(x;0,2) 


Note that the y-axis is an asymptote of y= f(x; 0, x) if «<1, while 
f(O; 6, 1) = 1/0; and if x > 1, f(0; 6, x) =0. 
The CDF of X.~ GAM(Q, x) is 


x 


1 
F(x; 6, «) = t*~1¢e7 #8 dt (3.3.10 
ew 8a=| peo ) 


The substitution u = #/@ in this integral yields 
F(x; 0, x) = F(x/0; 1, «) (3.3.11) 


which depends on @ only through the variable x/@. Such a parameter usually is 
called a scale parameter. 

Usually, it is important to have a scale parameter in a model so that the results 
will not depend on which scale of measurement is used. For example, if X rep- 
resents time in months and it is assumed that X ~ GAM(@, x) with @ = 12, then 


P[X < 24 months] = F(24/12; 1, x) = F(2; 1, x) 
If one considers the time Y to be measured in weeks, then an equivalent model 


still can be achieved by considering Y to be a gamma variable with 
@=4-12 = 48. For example, 


P[X < 24 months] = P[Y < 96 weeks] = F(96/48; 1, x) 
= F(2; 1, «) 
as before. Thus, different scales of measurement can be accommodated by chang- 


ing the value of the scale parameter in this case without changing to a different 
general form. 

The CDF obtained in equation (3.3.10) generally cannot be solved explicitly, 
but if « is a positive integer, say x =n, then the integral can be expressed as a 
sum. 


Theorem 3.3.2 


——_— 
Example 3.3.7 


— 


3.3. SPECIAL CONTINUOUS DISTRIBUTIONS 113 


If X ~ GAM(@, n), where n is a positive integer, then the CDF can be written 


n-1 é 
F(x; 0, n) =1— py any e*/8 (3.3.12) 


Proof 
This follows by repeated integration by parts on integral (3.3.10). 


Notice that the terms in the sum in equation (3.3.12) resemble the terms of a 
Poisson sum with yz replaced by x/6@. 


The daily amount (in inches) of measurable precipitation in a river valley is a 
random variable X ~ GAM(0.2, 6). It might be of interest to know the probabil- 
ity that the amount of precipitation will exceed some level, say 2 inches. This 
would be 
~ 1 
ae x 
2 (0.2)°T (6) 
= 1— F(2;.0.2, 6) 


Px S22 Ce OA) se 


which can be found in Table.2 in Appendix C with » = 10. 


The mean of X ~ GAM(O, x) is obtained as follows: 


fe a] 


1 
E(X)=1 x xX te7 8 dy 
) i eT(6) 


1 fos) 
a (L+x)-1,-—x/é 
aa | x e dx 
_ oT +x) 2 1 


= —_———— Ee (he) 1, x/8 
eT) | @Ti+o™ © @ 


_ OFFE(L + x) 
2 OT (x) 


_ g KK) 
=e T(x) 


= k@ 
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Similarly, E(X?) = 6?«(1 + x), and thus 
Var(X) = 6?x(1 + K) — (x6)? = x6? 
In the previous example, the daily amount of precipitation is gamma distrib- 


uted with mean of 1.2 inches and variance of 0.24. 
Of course, the moments also can be obtained using the MGF, 


xk Tg ~ x8 


M mae : te 
x(0) [e OT) dx 


cae Soak i K-41 (t-1/8)x 
=a | xe dx 


After the substitution u = —(t — 1/6)x, we obtain 


1 ce 1 "= K-1i,-u 
M,(t) = (j _ :) TO) [ uX~*e”" du 
M,oj=(1—-6)-* t<1/6 (3.3.13) 


The rth derivative, in this case, is 


MOO =(« +7 -—D--- (e+ 1)KO' — O-* 


_ik+n i, —K-r 
=Toy  %- 0 


and M0) yields the rth moment of X, 


Tike +r) gr 


B= Ti) 


(3.3.74) 


Strictly speaking, this derivation is valid only if 7 is a positive integer, but it is 
possible to show by a direct argument that (3.3.14) is valid for any real r > — x. 


The power series has the form 


o4r 
M,(t)=1+ Se 


(3.3.18) 
A special case of the garnma distribution with @ = 2 and x = v/2 is referred to 
as a chi-square distribution with v degrees of freedom; this distribution is dis- 
cussed in more detail in Chapter 8. It will be seen that cumulative chi-square 
tables can be used to evaluate gamma cumulative probabilities. 
When xk = I, we obtain a special case known as the exponential distribution. 


Theorem 3.3.3 
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EXPONENTIAL DISTRIBUTION 
A continuous random variable X has the expenential distribution with parameter 
0 > Oif it has a pdf of the form 
1 
f(x; = a 6 eee O) (3.3.16) 


and zero otherwise. The CDF of X is 
F(x? 6) 21 Settle >0 (3.3.17) 


so.that.@ is.a scale parameter. 
The notation X ~ GAM(@, 1) could be used to designate that X has pdf 


(3.3.16), but a more common notation is 
X ~ EXP(6) (3.3.18) 


The exponential distribution, which is an important probability model for life- 
times, sometimes is characterized by a property that is given in the following 
theorem. 


Fora continuous random variable x, X.~ EXP(6) if and only if 
P[X >a+t|X >a] =P[X>14] (3.3.18) 
for all.a.>.0. and t >.0. 
Proof (only if) 
P[X >a+t and X>a] 
P[X >.a] 


_ PIX >at+t] 
~  PEX >a] 
en tatn/e 


P[X>a+t|X>a]= 


e —a/é 


=P[X >¢] 


. This shows that the exponential distribution satisfies property (3.3.19), which is 
known as the no-memory property. We will not attempt to show that the expo- 
nential distribution is the only such continuous distribution. 

If X is the lifetime of a component, then property (3.3.19) asserts that the 
probability that the component will last more than a + t time units given that it 
has already lasted more than'a units is the same as that of a new component 
lasting more than t units. In other words, an old‘component that still works is 


16 
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just as reliable as a new component. Failure of such a component is not the result 
of fatigue or wearout. 


Suppose a certain solid-state component has a lifetime or failure time (in hours) 
X ~ EXP(100). The probability that the component will last at least 50 hours is 


P[X 2 50] = 1 — F(50; 100) = e~ °° = 0.6065 


it follows from the relationship to. the gamma distribution that 
E(X)=1-6=6 and Var (X) =1- 6? = 6?. Thus, in the previous example, the 
mean lifetime of a component is “ = 100 hours, and the standard deviation, o, is 
also 100 hours. 


The exponential distribution is also a special case of another important contin- 
uous distribution called the Weibull distribution. 


WEIBULL DISTRIBUTION - 


A widely used continuous distribution is named after the physicist W. Weibull, 
who suggested its use for numerous applications, including fatigue and break- 
ing strength of materials. It is also a very popular choice as a failure-time 
distribution. 

A continuous random variable X is said to have the Weibull distribution with 
parameters 6 > 0 and @ > 0 ifit has a pdf of the form 


f(x; 8 P= f xP-te- GIP x 30 (3.3.20) 


and zero otherwise. A notation that designates that X has pdf (3.3.20) is 
X ~ WEI(6, £) (3.3.21) 


The parameter f is called a shape parameter. This is similar to the situation we 
encountered with the gamma distribution, because there are three basic shapes, 
depending on whether £ < 1, 8 =1, or B > 1. This is illustrated in Figure 3.3, 
which shows the graphs of pdf (3.3.20) for B = 0.5, 1, and 2. 

Notice that the y-axis is an asymptote of y= f(x; 0, f) if B< 41, while 
S(O; 6, 1) = 1/6; and if B > 1, then f(0; 0, f) = 0. 

One advantage of the Weibull distribution is.that the CDF can be obtained 
explicitly. by. integrating pdf (3.3.20): 


F(x; 8, Bp=1—e-F" x 30 (3.3.22) 


It.is also clear that (3.3.22) can be written as. F(x/0; 1, f), which means that 6 is a 
scale parameter, as discussed earlier in the chapter. 
The special case with 6 = 2 is known as the Rayleigh distribution. 
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FIGURE 3.3 The pdf's of Weibull distributions 


I (x;0,0.5) 


I(%58,2) 


I(%5631) 


Example 3.3.3 The distance. (in inches) that a dart hits from the.center of a target may be 
modeled as a random variable X ~ WEI(10, 2). The probability of hitting within 
five inches of the center is 


| P[X <5] = F(5; 10,2) = 1 — e7 Gh = 0.221 


The mean of X ~ WEI(6, f) is obtained as follows: 
=| pt yrs —(x/0F 
E(X)={ x 68 xP te dx 
i 


B a 
=a x1 +B 12-19? ay 
0 


Following the substitution t = (x/@)’, and some simplification, 


E(X) = 6 | tt 1e-8 dt = ar(1 + *) 
0 


Similarly, E(X?) = 6?T(1 + 2/f), and thus 


Var(X) = #[r(1 4 ;) = (i + ‘)| 


It follows from equation (3.3.22) that the 100 x pth percentile has the form 
x, = O[—In (1 — p)]'”. 
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PARETO DISTRIBUTION 


A continuous random variable X is said to have the Pareto distribution with 
parameters 0 > 0 and x > O if it has a pdf of the form 


K x —{xk+1) 
T(x; 6, «) = 7 1+ :) x>0 (3.3.23) 
and zero otherwise. A notation to designate that X has pdf (3.3.23) is 
X ~ PAR(O6, k) (3.3.24) 


The parameter « is a shape parameter for this model, although there is not as 
much variety in the possible basic shapes as we found with the gamma and 
Weibull models. The CDF is given by 

F(x; 0,«K) =1—- (: + ®) x>0 (3.3.25) 
Because equation (3.3.25) also can be expressed as F(x/0; 1, x), 6 is a scale param- 
eter for the Pareto distribution. 

The length of wire between flaws, X, discussed in Example 2.3.2 is an example 
of a Pareto distribution, namely X ~ PAR(1, 2),.and the graph of f(x; 1, 2) is 
shown in Figure 2.6. This model also has been used to model biomedical prob- 
lems, such as survival time following a heart transplant. 

Another related distribution, which is also sometimes referred to as the Pareto 
distribution, has a pdf of the form 


~(k+1) 
SMH= (52) y>a (3.3.26) 


and zero otherwise, where a> 0 and x >.0. 
It is straightforward to show that E(X) = @/(« —1) and Var(X) = 6?x/ 
[(x — 2)(« — 1)7], and that the 100 x pth percentile is x, = O[(1 — p)~!/" — 1]. 


NORMAL DISTRIBUTION 


The normal distribution was first published by Abraham de Moivre in 1733 as an 
approximation for the distribution of the sum of binomial random variables. It is 
the single most important distribution in probability and statistics. 

A random variable X follows the normal distribution with mean yu and variance 
o” if it has the pdf 


Lyte myo2 (3.3.27) 


I(x; ) = 


for —c <x <a, where ~co <y< oo and 0<oa < o. This is denoted by 
X~N(u, 2”) (3.3.28) 


The normal distribution also is referred to frequently as the Gaussian distribution. 
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The normal distribution arises frequently in physical problems; there is a theo- 
retical reason for this, which will be developed in Chapter 7. 

First, we will verify that the normal pdf integrates to 1, and then we will verify 
that the values of the parameters 4 and a? are indeed the mean and variance of 
Xx, 

Making a change of variable, z = (x — ~)/o with dx = o dz, gives 


2n 


aes | 
=2[ erm? dz 
o 64/20 


If we let w = z?/2, then z = ./2w and dz = (w */?/,/2) dw, so 


foo} ~1/2 T 
a [oe aes we 
oO 


=| Flas w 0) d= | 1 ent? de 


vn Lal 


which follows from equation (3.3.7). 

The integrand obtained following the substitution z = (x — y)/o is an impor- 
tant special case known as the standard normal pdf. We will adopt a special 
notation for this pdf, namely 


1 
Jin 


if Z had pdf (3.3.29), then Z ~ N(O, 1), and the standard normal CDF is given 
by 


=1 


= 22/2 


(2) = 


=O. <2.<:00 (3.3.29) 


@(z) = [ p(t) dt (3.3.30) 


Some basic geometric properties of the standard normal pdf can be obtained 
by the methods of calculus. Notice that 


$(z) = ¢(—2) (3.3.31) 


for all real z, so f(z) is an even function of z. In other words, the standard normal 
distribution is symmetric about z = 0. Furthermore, because of the special form 
of $(z), we have 


$'(z) = —29(z) : (3.3.32) 
and 
$"(2) = (2? — 1)4(z) (3.3.33) 


Consequently, ¢(z) has a unique maximum at z= 0 and inflection points at 
z= +1. Note also that $(z)~0 and ¢(z) = —2/[./2n exp (z?/2)] 0 as 
Za tO. 
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It is also possible, using equations (3.3.32) and (3.3.33), to find E(Z) and E(Z?). 
Specifically, 


E(Z) = iE z@(z) dz 


ss { ” $0) dz 


—$(2)|2. 
=0 


and 


E(Z?) = [ 27p(z) dz 
= |" 16" + den ae 


=99lat | dae 


=0+1 
=1 


Similar results follow for the more general case X ~ N(u, 07). Based on the 
substitution z = (x — y)/o, we have 


- coo) 1 Vfx— pV 
nx) = [" xe | -3( o Je 


= ‘ (u + 02)h(z) dz 


=«[ He) de+0 |” 200) d 


=H 


_ roo) 1 4 Xx—U 2 
ar) ( late) 15 
=|" (u + 02)*G(z) de 


and 


= yl? ie o(z) dz + 2uo is zo(z) dz + a? [ 2° p(z) dz 


=u?+0? 
It follows that Var(X) = E(X?) — w? = (uv? +o?) — w? = 0. 
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FIGURE 3.4 A normal pdf 


J 


The graph of y = f(x; y, o) is shown in Figure 3.4. 
The general topic of transformation of variables will be discussed later, but it is 
convenient. to consider the following theorem at this point. 


Theorem 3.3.4 If X ~ N(u, 0”), then 


> Ce 
1. Z=-—*~NO, 1) 


2. F(x) = o(&4) (3.3.34) 


Proof 
Fz) = P[Z <z] 


i fe < | 
oO 


= P[LX <yu4+20] 


ae ee ec | 3 (24) a 
-o /2n6 : 2 oO 


After the substitution w = (x — w/c, we have 


1 


Jin 


Part 1 follows by differentiation, f,(z) = F/{z) = | iia 
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We obtain Part 2 as follows: 


F x(x) = P[X <x] 


tee 


oC oO 


= o(&=#) (3.3.35) 


0 


Standard normal cumulative probabilities, ®(z), are provided in Table 3 in 
Appendix C for positive values of z. Because of the symmetry of the normal 
density, cumulative probabilities may be determined for negative values of z by 
the relationship 


®(—z) = 1 — ®(z) (3.3.36) 


We will let z, denote the yth percentile of the standard normal distribution, 
@(z,) = y. For example, for y = 0.95, z9. 95 = 1.645 from Table 3 (Appendix C). By 
symmetry, we know that @(—1.645)=1—0.95=0.05. That is, zo 95 
= —2Z1~9.95- It follows that 


PlZ9.05 < Z < 29.95] = 0.95 — 0.05 =.0.90 
Or 
Pl —Z,_0.05 <i< Z1_-0.051 = 0.90 . (3.3.37) 


Some authors find it more convenient to use z, to denote the value that has « 
area to the right; however, we will use the notation above, which is more consis- 
tent with our notation for percentiles. Thus, in general, 


P[—2;~.)2 < zZ < 2121 = 1 —@ (3.3.38) 


which corresponds to equation (3.3.37) with w= 0.10 and where z,_.)2 = 20.95 
= 1.645. Similarly, if « = 0.05, then z, 2,2 = Zo.975 = 1.96, and 


P[—1.96 < Z < 1.96] = 0.95 


It often is useful to consider normal probabilities in terms of standard devi- 
ations from the mean. For example, if X ~ N(y, o), then 


Plu — 1.960 < X <p +1960] = Fy(u + 1.960) — Fy(u — 1.960) 


x of +1960 — “) _ o(# — 1.966 — “) 
Oo oO 

= (1.96) — ©(—1.96) 

= 0.95 


That is, 95% of the area under a normal pdf is within 1.96 standard deviations of 
the mean, 90% is within 1.645 standard deviations, and:so on. 


ae Saas 
Example 3.3.4 
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Let X represent the lifetime in months of a battery, and assume that approx- 
imately X ~ N(60, 36). The fraction of batteries that will fail within a four-year 
warranty period is given by 


P[X < 48] = o(* 7 =) 
= 0(—2) 
= 0.0228 


If one wished to know what warranty period would correspond to 5% failures, 
then 


PLN Sx 6c = o(7e—) = 0.05 


which means that (x9 95 —.60)/6 = — 1.645, and X99; = —1.645(6) + 60 = 50.13 
months. 


In general, we see that the 100 x. pth percentile is 
Xp=H+2,0 (3.3.39) 
In the previous example, note also that 


P[X <0] = o(°=*) = (—10)=0 


Thus, although the normal random variable theoretically takes on values over 
the whole real line, it still may provide a reasonable model for a variable that 
takes on only positive values, if very little probability is associated with the nega- 
tive values. Another possibility is to consider a truncated normal! model when the 


- variable must be positive, although we need not bother with that here; the prob- 


Theorem 3.3.5 


ability assigned to the negative values is so small that the truncated model would 
be essentially the same as the untruncated model. Of course, there is still.no 
guarantee that the normal model is a good choice for this variable. In particular, 
the normal distribution is symmetric, and it is not- uncommon for ‘lifetimes to 
follow skewed distributions. The question of model selection is a statistical topic 
to be considered later, but we will not exclude the possibility of using a normal 
model for positive variables in examples, with the understanding that it may be 
approximating the more theoretically correct truncated normal model. 
Some additional properties are given by the following theorem. 


If X ~ N(u, 07), then 


M,(t) = et t72 (3.3.40) 
tq2r 
E(X — py = ae r=1,2,... (3.3.41) 


E(X — "7? =0 rsd, 2c. (3.3.42) 
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Proof 


To show equation (3.3.40), we note that the MGF for a standard normal random 
variable is given by 


oe | 
M(t) =| Tia et%e7 27/2 dz 


© 2n 


fos) 
=| 1 o-e-mpa+ap dz = et/2 
2n 


The integral of the first factor in the second integral.is 1, because it is the 
integral of a normal pdf with mean t and variance 1. Because X = Zo + yp, 


M x(t) = Myz+,(t) = eM (at) = et F972 
Equations (3.3.41) and (3.3.42) follow from a series expansion: 
My_,(t) = er? 
Tas 
r=0 
co ae 12" 
r=0 2'7F1(2r)! 


es 


This expansion contains only even integer powers, and the coefficient of 
t?"/(2r)! is the 2rth moment of (X — y). 


The mean ofa normal distribution is an example of.a special type of parameter 
known as. a location parameter, and the standard deviation is.a scale parameter. 


3.4 


LOCATION AND SCALE PARAMETERS 


In each of the following definitions, F(z) represents a completely specified CDF, 
and f(z). is the pdf. 


Definition 3.4.7 


Location Parameters A quantity 4 is a location parameter for the distribution of X 
if the CDF has the form 


F(x; 4) = Fo(x — 9) (3.4.1) 


In other words, the pdf has the form 
I(x; 1) = fo(x —n) (3.4.2) 
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Example 3.4.1 A distribution that often is encountered in life-testing applications has pdf 


fase OM xa 


and zero otherwise. The location parameter, y, in this application usually is called 
a threshold parameter because the probability of a failure before 7 is zero. This is 
illustrated in Figure 3.5. 


FIGURE 3.5 An exponential pdf with a threshold parameter 


S (50) 


It is more common for a location parameter.to.be.a measure of central ten- 
dency.of X, such as a mean or a median. 


| Example 3.4.2 Consider the pdf 


1 
fol@) = 5 —-O<2<.0 
If X has pdf of the form 
1 o-is-nl 
San = Ze oe ~O<x< 0 
FIGURE 3.6 A double-exponential pdf with a location parameter 


J (x) 
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then the location parameter, 7, is the mean of the distribution. Because f(x; 7) is 
symmetric about y, and has a unique maximum, y is also the median and the 
mode in this example. This is illustrated in Figure 3.6. 


The notion of a scale parameter was mentioned earlier in this chapter. A more 
precise definition now will be given. 


Definition 3.4.2 


Scale Parameter A positive quantity 6 is a scale parameter for the distribution of 
X if the CDF has the form 


F(x; 0) = r(2) (3.4.3) 


In other words, the pdf has the form 


S(%; A= : (3) (3.4.4) 


A frequently encountered example of a random variable, the distribution of 
which has a scale parameter, is X ~ EXP(6). 

The standard deviation, o, often turns out to be a scale parameter, but some- 
times it is more convenient to use something else. For example, if X ~ WEI(6, 2), 
then 0 is a scale parameter, but it is not the standard deviation of X. ° 

Often, both types of parameters are required. 


Definition 3.4.3 


Location-Scale Parameter Quantities 1 and @ > 0 are called location-scale param- 
eters for the distribution of X if the CDF has the form 


F(x; 6, 9) = * a ") (3.4.5) 


In other words, the pdf has the form 


1 = 
fin =5 a(* 5 ") 


The normal distribution is the most commonly encountered location-scale dis- 
tribution, but there are other important examples. 


Example 3.4.3 Consider a pdf of the form 


1 
figze 


s ae) —-O<Z72< 0 (3.4.7) 
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If X has pdf of the form (1/0)fo[(x — 1)/@], with fo(z) given by equation (3.4.7), 
then X is said to have the Cauchy distribution with location-scale parameters 4 
and 6, denoted 


X ~ CAU(4, 7) (3.4.8) 


It is easy to show that the mean and variance of X do not exist, so 4 and 6 
cannot be related to a mean and standard deviation. We still can interpret 4 as 
either the median or mode. 


Another location-scale distribution, which is frequently encountered in life- 
testing applications, has pdf 


| ae 
T(x; 4, n) = G XP (-25) x>yH (3.4.9) 


and zero otherwise. This. is called. the two-parameter exponential distribution, 
denoted by 


X ~ EXP(0, ) (3.4.10) 


A location-scale distribution based on the pdf fo(z) of Example 3.4.2 is called 
the Laplace or double-exponential distribution, denoted by 


X ~ DEG, 7) (3.4.11) 


It also is possible to define three-parameter models if we replace f,(z) with 
a pdf, f(z), that depends on another parameter, say ~. For example, if 
Z~ WEI(1; f), then. X has ‘the three-parameter Weibull distribution, with 
- location-scale parameters 4 and 6 and shape parameter £ if its pdf is of the form 
f(x; 9,1, B = 0/f,[(x — 1/6]. Similarly, if Z ~ GAM(1, «), then X has the 
three-parameter gamma distribution. These are denoted, respectively, by 
X ~ WEI(G, n, 8) and X ~ GAM@, g, x). 


SUMMARY 


The purpose of this chapter was to develop’ special probability distributions. 
Special discrete distributions—such as the binomial, hypergeometric, negative 
binomial, and Poisson distributions—provide useful models for experiments that 
involve counting or other integer-value responses. Special continuous 
distributions—such as the uniform, exponential, gamma, Weibull, and normal 
distributions—provide useful models when experiments involve measurements on 
a continuous scale such as time, length, or weight. 
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EXERCISES 


An office has 10 dot matrix printers. Each requires a new ribbon approximately every 

seven weeks. If the stock clerk finds at the beginning of a certain week that there are only 
five ribbons in stock, what is the probability that the supply will be exhausted during that 
week? 


In a 10-question true-—false test: 
(a) What is the probability of getting all answers correct by guessing? 
(b) What is the probability of getting eight correct by guessing? 


A basketball player shoots 10 shots and the probability of hitting is 0.5 on each shot. 
(a) What is the probability of hitting eight shots? 
(b) What is the probability of hitting eight shots if the probability on each shot is 0.6? 
(c) What are the expected value and variance of the number of shots hit if p = 0.5? 


A four-engine plane can fly if at least two engines work. 
(a) If the engines operate independently and each malfunctions with probability g, what 
is the probability that the plane will fly safely? 
(b) A two-engined plane can fly if at least one engine works. If an engine malfunctions 
with probability g, what is the probability that the plane. will fly safely? 
(c) Which plane is the safest? 


(a) The Chevalier de Mere used to bet that he would get at least one 6 in four rolls of a 
die. Was this a good bet? 

(b) He also bet that he would get at least one pair-of 6’s in 24 rolls of two dice. What 
was his probability of winning this bet? 

(c): Compare the probability of at least one 6 when six dice are rolled with the 
probability of at least two 6’s when 12 dice are rolled. 


if the probability of picking a winning horse in a race is 0.2, and if X is the number of 
winning picks out of 20 races, what is: 


(a) P[X = 4]. 
(b) PLX < 4]. 
(c) E(X) and Var(X). 


If X ~ BIN(®, p), derive E(X) using Definition 2.2.3. 


A jar contains 30 green jelly beans and 20 purple jelly beans. Suppose 10 jelly beans are 
selected at random from the jar. ; 
(a) Find the probability of obtaining exactly five purple jelly beans if they are selected 
with replacement. 
(b) Find the probability of obtaining exactly five purple beans if they are selected 
without replacement. 


An office has 10 employees, three men and seven women. The manager chooses four at 
random to attend a short course on quality improvement. 


70. 


77. 


12. 
73. 


74. 


G5. 


76. 


47. 


EXERCISES 129 


(a) What is the probability that an equal number of men and women are chosen? 

(b) What is the probability that more women are chosen? 
Five cards are drawn without replacement from a regular deck of 52 cards. Give the 
probability of each of the following events: 

(a) exactly two aces. 

(b) exactly two kings. 

(c) less than two aces. 

(d) at least two aces. 
A shipment of 50 mechanical devices consists of 42 good ones and eight defective. An 
inspector selects five devices at random without replacement. 

(a) What is the probability that exactly three are good? 

(b) What is the probability that at most three are good? 
Repeat Exercise 10 if cards are drawn with replacement. 
A man pays $1 a throw to try to win a $3 Kewpie doll. His probability of winning on each 
throw is 0.1. 

(a) What is the probability that two throws will be required to win the doll? 

(b) What is the probability that x:throws-will be required to win the doll? 

(c) What is the probability that more than three throws will be required to win the 

doll? 

(d) What is the expected number of throws needed to win a doll? 
Three men toss coins to see who pays for coffee. If all three match, they toss again. 
Otherwise, the “odd man” pays for coffee. 

(a) What is the probability that they will need to do this more than once? 

(b) What is the probability of tossing at most twice? 


The man in Exercise 13 has three children, and he must win.a Kewpie doll for each one. 
(a) What is the probability that 10 throws will be required to win the three dolls? 
(b) What is the probability that at least four throws will be required? 

(c) What is the expected number of throws needed to win three dolls? 


Consider. a seven-game world series between team A and team B, where for each game 
P(A wins) = 0.6. 
(a). Find P(A wins series in x games). 
(b) You hold a ticket for the seventh game. What is the probability that you will get to 
use it? 
(c) If P(A wins a game) = p, what value,of p maximizes your chance in (b)? 
(d) What is the most likely number of games to be played in the series for p = 0.6? 


The probability of a successful missile launch is 0.9. Test launches are conducted until 
three successful launches are achieved. What is the probability of each of the following? 
(a) Exactly six launches will be required. 
(b) Fewer than six launches will be required. 
(c) At least four launches will be required. 
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Let X ~ GEO(p). 
(a) Derive the MGF of X. 
(b) Find the FMGF of X, 
(c) Find E(x). 
(d) Find ELX(X — 1)]. 
(e) Find Var(X). 


Let X ~ NB(r, p). 
(a) Derive the MGF of X. 
(b) Find E(X). 
(c) Find Var(X). 


Suppose an ordinary six-sided die is rolled repeatedly, and the outcome—1, 2, 3, 4, 5 or 
6—is noted on each roll. 
(a) What is the probability that the third 6 occurs on the seventh roll? 
(b) What is the probability that the number of rolls until the first 6 occurs is at most 
10? 


The.number of calls that arrive at a switchboard during one hour is Poisson distributed 
with mean «.=.10. Find the probability of occurrence during an hour of each of the 
following events: 

(a). Exactly seven calls arrive, 

(b) At most seven.calls arrive. 

(c) Between three and seven calls (inclusive) arrive. 


If X has a Poisson distribution and if P[X =0] = 0.2, find PLX > 4]. 


A certain assembly line produces electronic components, and defective components occur 
independently with probability .01. The assembly line produces 500 components per hour. 
(a) Fora given hour, what is the probability that the number of defective components is 
at most two? 
(b) Give the Poisson approximation for (a). 


The probability that a certain type of electronic component will fail during the first hour 
of operation is 0.005. If 400 components are tested independently, find the Poisson 
approximation of the probability that at most two will fail during the first hour. 


Suppose that 3% of the items produced by an assembly line are defective. An inspector 
selects 100 items at random from the assembly line. Approximate the probability that 
exactly five defectives are selected. 4 


The number of vehicles passing a certain intersection in the time interval [0, t] is a 
Poisson process X(t) with mean E[X(t)] = 3t, where the unit of time is minutes. 
(a) Find the probability that at least two vehicles will pass during a given minute. 
(6) Define the events A = at least four vehicles pass during the first minute and B= at 
most two vehicles pass during the second minute. Find the probability that both 4A 
and B occur. 


27, 


28. 


29. 


37. 
$2. 


35. 


EXERCISES 413i 


Let X ~ POI). 
(a) Find the factorial moment generating function (FMGF) of X, Gy(t). 
(b) Use G,(t) to find E(X). 
(c) Use G,(t) to find ELX(X — 1)]. 


Suppose the X ~ POI(10). 
(a) Find P[S5 < X < 15]. 
(b) Use the Chebychev Inequality to find a lower bound for P[S < X < 15] 
(c) Find a lower bound for P[1 — k < X/u < 1 +k] for arbitrary k > 0. 


A 20-sided (icosahedral) die has each face marked with a different integer from 1 through 
20. Assuming that each face is equally likely to occur on a single roll, the outcome is a 
random variable X ~ DU(20). 

(a) If the die is rolled twice, find the pdf of the smallest value obtained, say Y. 

(b) If the die is rolled three times, find the probability that the largest value is 3. 


(c) Find E(X) and Var (X). 


Let X ~ DU(N). Derive the MGF of X. Hint: Make use of the identity s + s? + --- 
_ oN 
je) fors #1. 
1-—s 
Let X¥ ~ UNIF(a, b). Derive the MGF of X. 
The hardness of a certain alloy (measured on the Rockwell scale)'is a random variable X. 
Assume that X ~ UNIF(50, 75). 
(a) Give the CDF of X. 
(b) Find P[60 < X < 70]. 
(c) Find E(x). 
(d) Find Var(X). 
If Q ~ UNIF(0, 3), find the probability that the roots of the equation g(t) = 0 are real, 
where g(t) = 447 + 40¢+ 042. 
Suppose a'value x is chosen “at random” in the interval [0, 10]. In other words, x is an 
observed value of a random variable X ~ UNIF(0, 10). The value x divides the interval [0, 
10] into two subintervals. 
(a) Find the CDF of the length of the shorter subinterval. 
(b) What is the probability that the ratio of lengths of the shorter to the longer 
subinterval is less than 1/4? 


Prove that (1/2) = Jn. Hint: Use the following steps. 
(1) Make the substitution x = Jt in the integral 


~ 


T(1/2) -[ t Ve~! dr 


0 
(2) Change to polar coordinates in the double integral 


I(5)] a C [fs exp [—(x? + y*)] dx dy 
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Use the properties of Theorem 3.3.1 to find each of the following: 
(a) T0). 
(b) 1°(5/2). 


; ‘ : , : n\., 
(c) Give an expression for the binomial coefficient, @ in terms of the gamma 
function. 


The survival time (in days) of a white rat that was subjected to a certain level of X-ray 
radiation is a random variable X ~ GAM(5, 4). Use Theorem 3.3.2 to find: 


(a) PLX < 15]. 
(b) P[15 < X < 20]. 
(c) Find the expected survival time, E(X). 


The time (in minutes) until the third customer of the day enters a store is a random 
variable X ~.GAM(1, 3). If the store opens at 8 .A.M., find the probability that: 


(a) the third customer arrives between 8:05 and 8:10; 
(b) the third customer arrives after 8:10; 
(c) Sketch the graph of the pdf of X. 


Suppose that for the variable Q of Exercise 33, instead of a uniform distribution we assume 
Q ~ EXP(1.5). Find the probability that the roots of g(t) = 0 are real. 


Assume that.the time (in hours) until failure of a transistor.is a random variable 
X ~ EXP(100). 
(a) Find the probability that X > 15. 
(b) Find the probability that X > 110. 
(c) It is observed after 95 hours that the transistor still is working. Find the conditional 
probability that X > 110. How does this compare to (a)? Explain this result. 


(d) What is Var(X)? 
If X ~ GAM(1, 2), find the mode of X. 


For a switchboard, suppose the time X (in minutes) until the third call of the day arrives is 
gamma distributed with scale parameter 6 = 2 and shape parameter x = 3. If the 
switchboard is activated at 8 a.m. find the probability that the third call arrives before 
8:06 A.M. 


If X ~ WEI(6, f), derive E(X*) assuming that k > —f. 


Suppose X ~ PAR(@, x). 
(a) Derive E(X);« > 1. 
(b) Derive E(X?); « > 2. 


If X ~ PAR(1L00, 3), find E(X) and Var(X). 


The shear strength (in pounds) of a spot weld is a Weibull distributed random variable, 
X ~ WEI(400, 2/3). 
(a) Find PLX > 410]. 
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(b) Find the conditional probability P[X >.410| X > 390]. 
(c) Find £(X). 
(d) Find Var(X). 


The distance (in meters) that a bomb hits from the center of a target area is a random 
variable X ~ WEI(10, 2). 
(a). Find the probability that the bomb hits at least 20 meters from the center of the 
target. 
(b) Sketch the graph of the pdf of X. 
(c). Find E(X) and Var(X). 


Suppose that X ~ PAR(@, x). 
(a) Derive the 100 x pth percentile of X. 
(b) Find the median of X if @ = 10 and x = 2. 


Rework Exercise 37 assuming that, rather than being gamma distributed, the survival time 
is a random variable X.~.PAR(4, 1.2). 


Rework Exercise 40 assuming that, rather than being exponential, the failure time has a 
Pareto distribution X ~ PAR(100, 2). 


Suppose that Z ~-N(0, 1): Find the following probabilities: 
(a) P(Z < 1.53). 
(b) P(Z > —0.49). 
(c) P(0.35 < Z <.2.01). 
(d) PZ} > 1.28). 
Find the values a and } such that: 
{e) P(Z.< a) = 0.648. 
(f) P(|Z| <b) = 0.95. 


Suppose that X ~ N(3, 0.16). Find the following probabilities: 
(a) P(X > 3). 
(b). P(X >-3.3). 
(c) P(2.8 < X < 3.1). 
(d) Find the 98th percentile of X. 
(e) Find the value c such that P(3 —c <X <3+.c) = 0.90. 


The Rockwell hardness of a metal specimen is determined by impressing the surface of the 
specimen with a hardened point, and then measuring the depth of penetration. The 
hardness of a certain alloy is normally distributed with mean of 70 units and standard 
deviation of 3 units. 
(a) If a specimen is acceptable only if its hardness is between 66 and 74 units, what is 
the probability that a randomly chosen specimen is acceptable? 
(b) If the acceptable range is 70 + c, for what value of c would 95% of all specimens be 
acceptable? 
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Suppose that X ~ N(10, 16). Find: 
(a) P[LX < 14]. 
(b) P[4 < X < 18]. 
(c) P[2X — 10 < 18]. 
(d) Xo.95, the 95th percentile of X. 


Assume the amount of light X (in lumens) produced by a certain type of light bulb is 
normally distributed with mean yx = 350 and variance o? = 400. 
(a) Find P[325 < X < 363]. 
(b) Find the value c such that the amount of light produced by 90% of the light bulbs 
will exceed c lumens. 


Suppose that X ~ N(1, 2). 
(a) Find E(X — 1)*. 
(b) Find E(x). 


Suppose the computer store in Exercise 26 of Chapter 2 expands its marketing operation 
and orders 10 copies.of the software package. As before, the annual demand is a random 
variable, X, and unsold copies are discarded; but assume now that X ~ BIN(10, p). 


(a) Find the expected net profit to the store as a function of p. 
(b) How large must.p be to produce.a positive expected net profit? 


(c) Ifinstead X¥ ~ POI(2), would the store make a greater expected net profit by 
ordering more copies of the software? 


Consider the following continuous analog of Exercise 57. Let X represent the annual 
demand for some commodity that is measured on a continuous scale, such as a liquid 
pesticide which can be measured in gallons (or fractions thereof). At the beginning of the 
year, a farm-supply store orders c gallons at d, dollars per gallon and sells it to customers 
at d, dollars per gallon. The pesticide loses effectiveness if it is stored during the off-season, 
so any amount unsold at the end of the year is a loss. 


¢ 


(a) If S is the amount sold, show that E(S) = { x f(x) dx + c[1 — F(c)]. 
) 
(b) Show that the amount c that maximizes the expected net profit is the 100 x pth 
percentile of X with p = (d, — d,)/d,. 
(c) Ifd, = 6,d, = 14, and X ~ UNIF(980, 1020), find the optimum choice for c. 
(d) Rework (c) if, instead, X ~ N(1000, 100). 


The solution of Exercise 58 can be extended to the discrete case. Suppose now that X is 
discrete as in Exercise 57, and the store pays d, dollars per copy, and charges each 
customer d, dollars per copy. Furthermore, let the demand X be an arbitrary nonnegative 
integer-valued random variable, with pdf f(x) and CDF F(x). Again, let c be the number of 
copies ordered by the store. 
(a) Show that E(S) = ¥° x f(x) + c[1 — F(o)]. 
x=0 


(b) Express the net profit Y as a linear function of S, and find E(Y). 
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(c) Verify that the solution that maximizes E(Y) is the smallest integer c such that 
F(c) > (d, — d,)/d,. Hint: Note that the expected net profit is a function of c, say 
g(c) = E(Y), and the optimum solution will be the smallest c such that 
g(c + 1) — gc) < 0. 

(d) If X ~ BIN(10, 1/2), d, = 10, and d, = 35, find the optimum solution. 

(e) Rework (d) if, instead, X ~ POI(5). 


JOINT 
DISTRIBUTIONS 


4.1 


INTRODUCTION 


In many applications there will be more than one random variable of interest, say 
Xj, Xz, ..., X;,. It is convenient mathematically to regard these variables as 
components of a k-dimensional vector, X = (X,, X,,..., X,), which is capable of 
assuming values x = (x,, x2, ..., x,) in a k-dimensional Euclidean space. Note, 
for example, that an observed value x may be the result of measuring k character- 
istics once each, or the result of measuring one characteristic k times. That is, in 
the latter case x could represent the outcomes on k repeated trials of an experi- 
ment concerning a single variable. 
As before, we will develop the discrete and continuous cases separately. 
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Example 4.2.47 


Definition 4.2.7 


The joint probability density function (joint pdf) of the k-dimensional discrete 
random variable X = (X,, X,,..., X,) is defined to be 


SI (%y, Xa, -.-, X) = PLX, =x, Xz = xX2,..., X, = %] (4.2.1) 


for all possible values x = (x,,x2,..., x,) of X. 


In this context, the notation [X, = x,, X, = x2, ..., X;, = x,] represents the 
intersection. of -k-events [X,=x,] a [(X,=x,.} 0°: [X, =x,J. Another 
notation for the joint pdf involves subscripts, namely fy, x, ..., x(%1, X25 +++) Xp): 


This notation is a bit more cumbersome, and we will use it only when necessary. 


Recall in Example 3.2.6 that a bin contained 1000 flower seeds and 400 were red 
flowering seeds. Of the remaining seeds, 400 are white flowering and 200 are pink 
flowering. If..10.seeds are.:selected -at.random without. replacement, then the 
number of-red flowering seeds, X,, and the number of white flowering seeds, X,, 
in the sample are jointly distributed discrete random variables. 

The joint pdf of.the. pair (X.,; X,) is obtained easily by the methods of Section 


1.6. Specifically, 
& 400 200 
x, /\ x2 /\10 — x, — x, 


1000 (4.2.2) 
10 


for all O< x,, O<x,, and x; +x, <.10. The probability of obtaining exactly 
two red, five white, and three pink flowering seeds is f(2, 5) = 0.0331. Notice that 
once the values of x, and x, are specified, the number of pink is also determined, 
namely 10 —x, — x2, so it suffices to consider only two variables. 


f(%1, X2) = 


This is 'a special case of a more general type of hypergeometric distribution. 


EXTENDED HYPERGEOMETRIC DISTRIBUTION 


The hypergeometric distribution of equation (3.2.10) can be generalized to apply 
in cases where there are more than two types of outcomes of interest. 
Suppose that a collection consists of a finite number of items, N, and that there 
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are k + 1 different types; M, of type 1, M, of type 2, and so on. Select n items at 
random without replacement, and let X,; be the number of items of type i that are 
selected. The vector X = (X,, X,,..., X,) has an extended hypergeometric dis- 
tribution and a joint pdf of the form 


Bele ©, 


S61) Xo oes) = y (4.2.3) 
n 


k k 
for all 0 <x; < M;, where M,,, =N— )’M;, and x%4,=n— > x;. A special 
i=t ist 


notation for this is 
A ~ HYP(n, M,, M2,..., M,, N) (4.2.4) 


Note that only k random variables are here, and x,,,; is used only as a nota- 
tional convenience. The corresponding problem when the items are selected with 
replacement can be solved with a more general form of the binomial distribution 
known as the multinomial distribution. 


MULTINOMIAL DISTRIBUTION 


Suppose that there are k+1 mutually exclusive and exhaustive events, say 
E,, E,, ..., E,, E,41, which can occur on any trial of an experiment, and let 
p; = P(E) fori = 1, 2,...,k + 1. Onn independent trials of the experiment, we let 
X; be the number of occurrences of the event E;. The vector X = (X1, Xz,..., X,) 
is said to have the multinomial distribution, which has a joint pdf of the form 


n! . 

X1,X5,.-., XP) So rp? + ss pp KD 4.2.5 

Sp Reser yD) Xy! xq! ss Xe, Pt PP PR+i ( ) 
k k 


for all 0 < x; <n, where x,,, =n — ) x, and pry =1— Y p;. 
i=1 i= 


A special notation for this is 
X~ MULT(, P42 D2.0++9 Px) (4.2.6) 


The rationale for equation (4.2.5) is similar to that of the binomial distribution. 
To have exactly x; occurrences of E;, it is necessary to have some permutation of 
x1 Ey’s, x2 E,’s, and so on. The total number of such permutations is 


aY/(x 2!) Ona), 


and each permutation occurs with probability pi'p3? ++» px). 

Just as the binomial provides an approximation to the hypergeometric dis- 
tribution, under certain conditions equation (4.2.5) approximates equation (4.2.3). 
In Example 4.2.1, let us approximate the value of f(2, 5) with (4.2.5), where p, 
= p,=0.4 and p; = 0.2. This yields an approximate value of 0.0330, which 
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agrees with the exact answer to three decimal places. Actually, this corresponds 
to the situation of sampling with replacement. In other words, if n is small, rela- 
tive to N and to the values of M;, then the effect of replacement or non- 
replacement is negligible. 


The four-sided die of Example 2.1.1 is rolled 20 times, and the number of 
occurrences of each side is recorded. The probability of obtaining four 1’s, six 2’s, 
five 3’s, and five 4’s can be computed from (3.2.5) with p; = 0.25, namely 


[20!/(4!)(6 !)(5!)(51)](0.25)?° = 0.0089 


If we were concerned only with recording 1’s, 3’s, and even numbers then equa- 
tion (4.2.5) would apply with p, = p; = 0.25 and 1 — p, — p3; = 0.5. The prob- 
ability of.four 1’s, five 3’s,and 11 even numbers would be 


[201/(41)(5 1)(111)1(0.25)9(0.5)** = 0.0394 


The functions defined by. equations (4.2.3) and (4:2:5);both sum to 1 when 
summed over. all possible values of:x = (x,, X2,;.-., X;,),,and both are nonnegative. 
This is necessary to define a discrete pdf. 


A function f(x,, x2, ..., X,) is the joint pdf for some vector-valued random vari- 
able X = (X,, X,,..., X;) if and only if the following properties are satisfied: 


S(X1; ¥2,.--, %) 20 for all possible values (x1, x2, ..., x,) (4.2.7) 
and 

DB Fhe Xe.4 ey) = 1 (4.2.8) 

x1 Xk 


In some two-dimensional problems it is convenient to present the joint pdf in a 
tabular form, particularly if a simple functional form for the joint pdf f(x,, x2) is 
not known. For the purpose of illustration, let X, and X, be discrete random 
variables with joint probabilities f(x,, x) as given in Table 4.1. These values rep- 
resent probabilities from a multinomial distribution (X ,, X 2) ~ MULT(3; 0.4, 0.4). 
For example, this model would apply to Example 4.2.1 if the sampling had been 
with replacement, or it would be an approximation to the extended hypergeomet- 
ric model for the without-replacement case. First notice that 


3. 3 
>, > S(%1, X2) aa 1 
x1=0 x2=0 
as shown in the table. It is convenient to include impossible outcomes such as 
(3, 3) in the table and assign them probability zero. Care must be taken with the 
limits of the summations so that the points with zero probability are not included 
inadvertently when the nonzero portion of the pdf is summed. 
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Values of the discrete pdf of MULT(3; 0.4, 0.4) 


Now we: are interested in the “marginal” probability, say P[X, = 0], without 
regard to what value X, may assume. Relative to.the joint sample space, X, has 


- the effect of partitioning the event, say A, that X, = 0; computing the marginal 


probability P(A) = P[X, = 0] is equivalent to computing a total probability as 
discussed in Section 1.5. That is, if B; denotes the event that X, = j, then 


A=(4 0 By) U(A 2 By U(AA B) U (AN Ba) 


and 


P(A) = 5 P(A ny B) 


J 


as shown in the right margin of the top row. of the table. Similarly, we could 
compute P[X, = 1], PLEX, = 2], and. so on. 
The numerical values of f,(x,) = P[X, = x,] are.given in the right margin of 
the table for each possible value of x,;. Clearly, 
LAGd=LY fey x2) =1 (4.2.9) 
x1 x1 X2 


so f,(x,) is a legitimate pdf and is referred to as the marginal pdf of X, relative to 
the original joint sample space. Similarly, numerical values of the function 


3 
J2(X%2) = P[X, = x2] = > F (i, x2) 
i=0 
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are given in the bottom margin of the table, and this provides a means of finding 
the pdf’s of X, and X, from f(x,, x). 


Definition 4.2.2 


If the pair (X 1, X,) of discrete random variables has the joint pdf f(x, x2), then the 
marginal pdf’s of X; and X, are 


Ai(X) = >. F(X, 2) (4.2.10) 


So{X2) =D, (1) X2) (4.2.11) 


Because (4.2.10) and (4.2.11) are the pdf’s of X, and X,, another notation 
would be fy (x1) and fx,(x2). 

Although the marginal pdf’s were motivated by means of a tabled distribution, 
it often is possible to derive formulas for f,(x,) and f(x.) if an analytic expression 
for f (x1, X) is available. 

For example, if (X,, X,) ~ MULT(,, p,, p2), then the marginal pdf of X, is 


fi(*,) as » I, x2) 


ax, 


= >, S(%1; X2) 
x2=0 
gh jy = 
xi Mn — xy)" 0 x2!E(n — x4) — x9]! 
n} next (py — X4 
ae, x1 x2r(f eek (n— x1) ~x2 
x, '(n — x,)! Py », ( xX, os Ke P1) — P2] 


= ( ” lpi tos +(1—p,)—p,]"™™! 
xy 


n 
= ( yaa “pies 
xy 


That is, X, ~ BIN(a, p,). This is what we would expect, because X, is counting 
the number of occurrences of some event E, on n independent trials. 

For the flower seed example, n = 3, p, = 0.4, and p, = 0.4. If we are interested 
only in X,, the number of red flowering seeds obtained, then we can lump the 
white and pink flowering seeds together and reduce the problem to a binomial- 
type problem. Specifically, X, ~ BIN(3, 0.4). Similarly, the marginal distribution 
of X, is BIN(n, p2) = BIN(3, 0.4). Of course, probabilities other than marginal 
probabilities can be computed from joint pdf’s. 
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If X is a k-dimensional discrete random variable and A is an event, then 


PIX Ala, OD. SOG ne (4.2.12) 
xeA 


For example, in the flower seed problem, we may want to know the probability 
that the number of red flowering seeds is less than. the number of white flowering 
seeds. In other words, the event of interest is [X, < X,]. If the sampling is done 
with replacement, Table 4.1 applies. The possible pairs corresponding to the event 
[X, < X,] are (x,, x2) = (0, 1), (0, 2), (0, 3), and (4, 2). 

It also is possible to express this event as [X € A], where A is a region of the 
plane, which in this example is A = {(x,, x.)|x, < x2}. In this setting, the pro- 
cedure would be to sum f(x;, x.) over all pairs that are enclosed within the 
boundary of the region A. In this example, we would sum over the pairs above 
the graph of the line x, = x,, which is the boundary of A, as shown in Figure 4.1, 
and thus 


=.0.048 + 0.096. + 0.064 + 0.192 =:0.400 


Marginal probabilities. canbe evaluated as. discussed in Section 2.2 by 
summing over the marginal pdf’s. 

A joint probability of special importance involves sets of the form 
Az=(—0,x,] x +--+ x (—o0, x,J; in other words, Cartesian products of intervals 
of the type (— 00, x,J,i=1,2,...,k. 


Region corresponding to the event [X, <X.] 


Theorem 4.2.2 
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Definition 4.2.3 


Joint CDF The joint cumulative distribution function of the k random variables 


Xj, Xz,..., X, is the function defined by 
F(x;, ..., X,) = PIX; < x4, ..., X, < &] (4.2.13) 


That is, F(x,,..., x,) denotes the probability that the random vector X will 
assume a value in the indicated k-dimensional rectangle, A. As in the one- 
dimensional case, other events can be expressed in terms of events of type A so 
that the CDF completely specifies the probability model. 

As in the one-dimensional case, the only requirement that a function must 
satisfy to qualify as a joint CDF is that it must assign probabilities to events in 
such a way that Definition 1.3.1 will be satisfied. In particular, a joint CDF must 
satisfy properties analogous to the properties given in Theorem 2.2.3 for the one- 
dimensional case. Properties of a bivariate CDF are listed below, and properties 
of a k-dimensional CDF would be similar. 


A function F(x, x2) is a bivariate CDF if and only if 
lim F(x,, x2) = F(—o, x,) =90 for all x, (4.2.14) 


x17 - 2 

lim F(x, x.) = F(x,, —0) =0- for all x, (4.2.16) 
X27. 0 

lim “F(x;, x2) = F(oo, co) = 1 (4.2.16) 


x,rn 
x27 @ 


F(b, d) — F(b, c) — F(a, d) + F(a,c) 20 for alla <b and c < d (4.2.17) 
lim F(x, +h, x2) = lim F(x,, x2 + h) = F(x,, x2) for all x, and x, 
h-O+ hA>0* 
(4.2.18) 
a 


Note that property (4.2.17) is a monotonicity condition that is the two- 
dimensional version of equation (2.2.11). This is needed to prevent the assignment 
of negative values as probabilities to events of the form A = (a, b] x (c, d]. In 
particular, : 


Pla< X,<b,c<X,<d] = F(b, d — F(b,.c) — Fla, d) + Fla, o) 


which is the value on the left of inequality (4.2.17). 

Property (4.2.18) asserts that F(x,, x2) is continuous from’ the right in each 
variable separately. Also note that (4.2.17) is something more than simply 
requiring that F(x,, x2) be nondecreasing in each variable separately. 
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Example 4.2.3 Consider the function defined as follows: 


_f0 ifx,+x,<-1 | 
Psy Xa) = ‘i tx, bx, > A Se 


If weleta=c= —1 and b=d=1Jin (4.2.17), then 
F(i, 1) — F(, —1) — F(-1, 1) + F(-1, -))=1—-1—14+0=-1 


which means that (4.2.17) is not satisfied. However, it is not hard to verify that 
(4.2.19) is nondecreasing in each variable separately, and all of the other proper- 
ties are satisfied. Thus, a set function based on (4.2.19) would violate property 
(1.3.1) of the definition of probability. 


4.3 


JOINT CONTINUOUS DISTRIBUTIONS 


The joint CDF provides a means for defining a joint continuous distribution. 


Definition 4.3.9 


A k-dimensional vector-valued random variable X = (X,, X,,...,X ,) is said to be 
continuous if there is a function f(x,, x2, ..., x,), called the joint probability density 
function (joint pdf), of X, such that the joint CDF-can be written as 


Fle = | ef Sty, «0.5 ty) dty +++ dt, (4.3.1) 


for all x = (x1,..., %). 


As in the one-dimensional case, the joint pdf can be obtained from the joint 
CDF by differentiation. In particular, 


or 
f (1, sees Xx) = ax, ++ Ox, F(x, fens Xx) (4.3.2) 


wherever the partial derivatives exist- To serve the purpose of a joint pdf, two 
properties must be satisfied. 


Theorem 4.3.7 Any function f(x,, x2, ..., x,) is a joint pdf of a k-dimensional random variable if 
and only if 


I (X41, -..,%) 20 for all x,,..., x, (4.3.3) 


Example 4.3.7 


FIGURE 4.2 
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and 


{ of SL (Xa) 00) %) ax, 1. dx, = 1 


= oO ae 


Numerous applications can be modeled by joint continuous variables. 


Let X, denote the concentration of a certain substance in one trial of an experi- 
ment, and X, the concentration of the substance in a second trial of the experi- 
ment. Assume that the joint pdf is given by f(x,, x2) = 4x, x2; 0< x, <1, 
0 < x, < 1, and zero otherwise. The joint CDF is given by 


x2 Xt 
F(x, X2) = { { S(t1, tz) dt, dt, 


x2 XL 
= { { 4t, t, dt, dt, 
0 0 


a xix 2 0<x,.<10<x,<1 


This defines F(x,, x2) over the region (0, 1) x (0, 1), but there are four other 
regions of the plane where it must be defined. In particular, see Figure 4.2 for the 
definition of F(x,, x2) on the five regions. 


Values of a joint CDF 


\ 


F(%1,X2) 


— 


i. 


Yj 
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It also is possible to evaluate joint probabilities by integrating the joint pdf 
over the appropriate region. For example, we will find the probability that for 
both trials of the experiment, the “average concentration is less than 0.5.” This 
event can be represented by [(X, + X2)/2<0.5], or more generally by 
[(X,, X2) € A], where A = {(x,, x2)| (x1 + X2)/2 < 0.5]. Thus, 


P[(X, + X2)/2 < 0.5] = P[(X,, X2) € A] 


-| I (%1, Xz) dx, dx, 


A 


1 1-—x2 
= 4x1x2 dx, dx, 
‘0 Jo 
1 


=| 2x,(1 — x)* dx, 
0 


DL et 


The region A is illustrated in Figure 4.3. 
For a general k-dimensional continuous random variable X = (X,,..., X,) and 


a k-dimensional event A we have 


P[X € A] =| Cie dx... aX, (4.3.5) 
A 


Earlier in the section the notion of marginal distributions was discussed for 
joint discrete random variables. A similar concept can be developed for joint 


FIGURE 4.3 Region corresponding to the event [(X, +.X,)/2 < 0.5] 


x 
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continuous random variables, but the approach is slightly different. In particular, 
consider the joint CDF, F(x,, x), of a pair of random variables X = (X,, X,). 
The CDF of X, is 


Fy(x,) = P[X, < x4] 
= P[X, X1,X,< co] 
= F(x,, oo) 


= [ (yf Fxs, xalt1, ta) it) dt, 


a [ F(t) dt, 


IN’ IN 


Thus, for the continuous case, the distribution function that we can interpret as 
the marginal CDF of X, is given by F(x,, 00) and the pdf associated with F(x) 
is the quantity enclosed in parentheses. That is, 


fii) = Fe Fibs) 


ie ap ig re X2) dt, dx, 
7 i F(% x9) dx, 


Similar results can be obtained for X,, which suggests the following definition. 


Definition 4.3.2 


If the pair (X,, X,) of continuous random variables has the joint pdf f(x,, x,), then 
the marginal pdf’s of X, and X, are 


fi(%) = I f(%1, ¥2) dx2 : (4.3.6) 


fAlX2) = [ f(X, X2) dx, (4.3.7) 


It follows from the preceding argument that f,(x,) and f,(x.) are the pdf’s of X, 
and X,, and consequently, another possible notation is f;,(x,) and f,,(x2). Con- 
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sider the joint pdf in Example 4.3.1. The marginal pdf of X , is 


1 
f(x.) = | 4x1 X2 dx, 
0 


1 
= Ax, i X2 dx, 
0 


= 2x, 


for any 0 < x, <1, and zero otherwise. Similarly, f,(x,) = 2x2 for any0 < x, <1. 
Actually, the argument preceding Definition 4.3.2 provides a general approach 
to defining marginal distributions. 


Definition 4.3.3 


if X= (Xi, X,,°..., X,) is a k-dimensional-random. variable with jomt CDF 
F(x,, X25...) X,), then the marginal CDF of X;is 


F(x) = lim F(x, ..., Xj -.65 X) (4.3.8) 
thins 


Furthermore, if X is discrete, the marginal pdf is 


SASS LOD TO ss hs 5 (4.3.9) 


alli#j 


and if X is continuous, the marginal pdf is 


I(x) = [- aI iC oreree ., X,) dX, ... dx, (4.3.10) 


alli#j 


Example 4.3.2 Let X,, X,, and X, be continuous with a joint pdf of the form f(x,, x2, x3) = ¢; 
0 <x, <x, <x; < 1, and zero otherwise, where c is a constant. First, note that 
c = 6, which follows from (4.3.4). Suppose it is desired to find the marginal of X,. 
From (4.3.10) we obtain 


fio = | [Moan ax 
=s[” Xz dx, 
0 


— 32 
= 3x3 


Leif O< x3 << 1, and zero otherwise. 


4.4 
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Notice that a similar procedure can be used to obtain the joint pdf of any 
subset of the original set of random variables. For example, the joint pdf of the 
pair (X;, X,) is obtained by integrating f(x,, x2, x3) with respect to x3 as 
follows: 


f (x1, X2) = i; f(%1, X25 X3) dx 


1 


x2 
= 6(1 — x.) 


if0 <x, < x2 < 1, and zero otherwise. 

A general formula for the joint pdf of a subset of an arbitrary collection of 
random variables would involve a rather complicated expression, and we will not 
attempt to give such a formula. However, the procedure described above, which 
involves integrating with respect to the “unwanted” variables, provides a general 
approach to this problem. 


INDEPENDENT RANDOM VARIABLES 


TABLE 4.2 


Suppose that X, and X, are discrete random variables with joint probabilities as 
given in Table 4.2. Note that f(1, 1)=0.2 =f,(1)f,(1). That is, P[X, =1 and 
X, = 1] = P[X, =1]P[X, =1], and we would say that the events [X, = 1] 
and LX, = 1] are independent events in the usual sense of Chapter 1. However, 
for example, f(1, 2) = 0.1 4,(1)f,(2), so these events are not independent. Thus 
there is in general a dependence between the random variables X, and X,, 
although certain events are independent. If f(x,; x2) = f;(x;) f2(x2) for all possible 
(x1; X3), then it would be reasonable to say in general that the random variables 
X, and X, are independent. : 


Values of the joint pdf of two 
dependent random variables 


Xe 


150 
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Similarly, for continuous. random. variables, suppose that f(x,, x3) 
= f,(x1) f2(x2) for all x, and x,. It follows that ifa<.bandc.< d, then 


d (b 
Pla< X,<b,c<X,<d] =| | I (%4, %2) dx, dx 
d fb 
-( [peatten dx, dx 


b d 
= | Fi%1) ax, | Fa(%2) dx2 


= Pla < X, <b]P[c < X, <d] 


Thus, the events A=[a< X,<b].and B={[c <X,<d] are independent in 
the usual sense, where in set notation this was expressed as P(A © B) = P(A)P(B). 
It is natural.to.extend the concept.of independence of events to the independence 
of random variables, where random variables X,.and X, would be said to be 
independent if all events of type A and B are independent. Note that these con- 
cepts apply to both discrete and continuous random variables. 


Definition 4.4.7 


Independent Random Variables ..Random variables X4, ...,; X, are-said to be inde- 


pendent if for every a, < b,, 


Pla, <X,<b,... ;<X;<b; (4.4.1) 


The expression: on the right of equation (4.4.1) is the product of the marginal 
probabilities P[a, < X; <b;,], ..., Pla, < X; <,]: The terminology stechasti- 
cally independent also is often used in this context: If (4.4.1) does not hold for all 
a; < b;, the random. variables are calied dependent. Some. properties that are 
equivalent to independence are stated in the following theorem. 


Random variables X,, . . . , X, are independent if and only if the following prop- 
erties holds: 
F(x4, .--, Xm) = Fi(xy) +++ Fix) (4.4.2) 
FX 0s Me) = Air) + Ai%e) (4.4.3) 
where F{x,) and f{x;) are the marginal CDF and pdf of X;, respectively. 


Clearly, the random variables in Example 4.3.1 are independent. Indeed, 
anytime the limits of the variables are not functionally related and the joint pdf 


a 


Theorem 4.4.2 


Example 4.4.7 


fog 


——— ee 
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can be factored into a function of x, times a function of x,, say f(x,, x) 
= g(x,)h(x2), then it can be factored into a product of the marginal pdf’s, say 
S(%1, X2) =f1(*,) fo(x2), by adjusting the constants properly. This is formalized 
as follows. 


Two random variables X, and X, with joint pdf f(x,, x.) are independent if and 
only if: 
1. the “support set,” {(x,, x2)| f(x1, x2) > O}, is a Cartesian product, A x B, 
and 


2. the joint pdf can be factored into the product of functions of x, and Xs 
F(X 1, X2) = glx, )A(x2). 2 


The joint pdf ofa pair X, and X, is 
SF (Xq; X2) = 8x; x, 0<x,<%x,<1 


and zero otherwise. This function can clearly be factored according to part (2) 
of the theorem, but the support set, {(x,, x,) |0.< x, < x, <_1}, is a triangular 
region that cannot be represented as.a-Cartesian product. Thus, X, and X, are 
dependent. 


Consider now a pair X, and X, with joint pdf 
S(%1,X2) =X, + x2 0<x,<1,0<x,<1I 


and zero otherwise. In this case the support set is {(x,, x.)|0< x, <1 and 


_ 0<x, <1}, which can be represented as A x B, where A and B are both the 


| Example 4.4.3 


open interval (0, 1). However, part (2) of the theorem is not satisfied because 
x, +x, cannot be factored as g(x,)h(x,). Thus, X; and X, are dependent. 


Many interesting problems can be modeled in terms of independent random 
variables. 


Two components in a rocket operate independently, and the probability that 
each component fails on a launch is p. Let X denote the number of launches 
required: to. have a failure of component 1, and let Y denote the number of 
launches required to have a failure of component 2. Assuming the launch 
trials are independent, each variable would follow a geometric distribution, 
X, Y ~ GEO(p), and if the components are assumed to operate independently, a 
reasonable model for the pair of variables (X, Y) would be 


Fx, vl Y) = fx) AO) 
= pq° "pq?" = p’g?*?"?, x= 1,2,...5 y=, 2,... 
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where g = 1 — p. The joint CDF of X and Y is 


Fe. y(i, j) 


x 
Fy y(x, y) = 
j=i fi 
y x 
= hee ie 


=(1- gl - 4) 
= Fy(x)F yy) 
which also could be obtained directly from (4.4.1) of Theorem 4.4.1. 


We also are interested in the random variables X and.T, where T = X + Y is 
the number of trials needed to get a failure in both components. Now, to have 
X =x and T = t, it is necessary to have X = x.and Y = t — x, so the joint pdf of 
X and T can be given by 


Sx = PX =x, T =H) 
= P[X =x, Y=t—x] 
= tx, y(x, t= x) 
—_ pa pg 24 
= pg t.s. teax4+1jx4+2,.8;3x=1,...,t-1 
It is clear from Theorem 4.4.2 that X and T are dependent because the support 
set cannot be a Cartesian product. This is also reasonable intuitively, because a 
large value-of X implies an even larger value of T: 


As noted earlier, the probability model is specified completely by either the pdf 
or the CDF. For example, the joint CDF of X-and T is 


Fy xx d= YY pq? 


k=1 f=k+i 


= 1 — q* — pxq'"* 


fort=x+1,x+2,..:5 %=1,2,...,t—1. 
The marginal CDFs of X and T can be obtained by taking limits as in equa- 
tion (4.3.8). Specifically, 
Fy(x) = limFy 7x, )=1-g  x=1,2,... 


t7-a 
F,(t) = lim Fx. (x, t) = Fy ot — 1, 8) 


Sing t—pt-Ngt t= 2,3... 


4.5 
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The marginal pdf of T could be obtained by taking differences, f;(t) 
= F(t) — F(t — 1), or directly from Definition 4.2.2: 


frlt) = fe 1%, 0 
=(t—1)p?q'=?..  t=2,3,... 


This is the pdf of a negative binomial variable with parameters r = 2 and p, 
which should not be surprising because T is the number of trials required to 
obtain two failures. Thus, T ~ NB(2, p). It also should come as no surprise that 
X ~ GEO(p); this was assumed at the outset. 

As noted earlier, the variables X and T are dependent and thus fy (x, 2) 
# fx) ff) for some values of x and t. It might be worth noting that verification 
of independence by seeing that the joint pdf factors into the product of the mar- 
ginal pdf’s requires verification for every pair of values. To see this, suppose we 
take p= 1/2 on the above example. It turns out that the events [X = 1] and 
[T = 3] are independent, because fy ,(1, 3) = 1/8 = f,(1)f,(3). However, X and 
T are not independent unless all such events are independent. This is not true in 
this case because, for example, fy (2, 3) = 1/8 but f,(2)f;(3) = 1/16. To show 
dependence, it suffices to find only one such pair of values. 


CONDITIONAL DISTRIBUTIONS 


Recall that independence also is related to the concept of conditional probability, 
and this suggests that the definition of conditional probability of events could be 


extended to the concept of conditional random variables. In the previous 


example, one may be interested in a general formula for expressing conditional 
probabilities of the form 
P[X =x, T=] fy rh, 
P[T =t|X =x] = ———_— = et 
: PIK=x) a(x) 


which suggests the following definition. 


Definition 4.5.7 


Conditional pdf If X, and X, are discrete or continuous random variables with 

joint pdf f(x, x2), then the conditional probability density function (conditional pdf) 

of X, given X, = x, is defined to be 

I (x1; X3) 

aba (4.5.1) 
fi%1) 


for values x, such that f,(x,) > 0, and zero otherwise. 


S(%2 1%) = 
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Similarly, the conditional pdf of X, given X, = x, is 


S(% |) = eta) (4.5.2) 


for x, such that f,(x2) > 0, and zero otherwise. 

As noted in the previous example, for discrete variables a conditional pdf is 
actually a conditional probability. For example, if X, and X, are discrete, then 
SF (x2| x). 4s the. conditional: probability of the event [X,=x,] given the event 
[X, = .x,]. In the case of continuous random variables, the interpretation of the 
conditional pdf is not as obvious because. P[X, = x,] = 0. Although f(x,|x,) 
cannot be.interpreted as a conditional probability.in this case, it can be thought 
of as assigning conditional “probability density” to arbitrarily small intervals 
[x2, X2.-+ Ax], in much the same way that the marginal pdf, f,(x.), assigns mar- 
ginal probability density. Thus, in the continuous case, the conditional probabil- 
ity of an event of the form [a < X, <b] given X, = x, is 


b 

— Pla<X,<b|X,=%x,] =| I (%2|x,) dx2 
: b 

| F(X; Xz) dx2 

4 (4.8.3) 


[ I (%1, Xz) dx. 


That is, the denominator is the total area under the joint pdf at X, = x,, and 


the numerator is the amount of that area for which a < X, < b (see Figure 4.4). . 


This could be regarded as a way of assigning probability to an event [a < X, 
< b] over a “slice,” X, = x, of the joint sample space of the pair (X,, X.). 

For this to be a valid way of assigning probability, f(x,|x,) must satisfy the 
usual properties of a pdf in the variable x, with x, fixed. The fact that 
St (x2|*,) 2 0 follows from (4.5.2); also note that 


| " file) O = | ef Oso) ag 


1 
Fi(%1) 
ene 
. Fi(%1) 


for the continuous case. The discrete case is similar. 

The concept of conditional distribution can be extended to vectors of random 
variables. Suppose, for example, that X = (X,,..., X,, ..., X;) has joint pdf f(x) 
and X, =(X,,..., X,) has joint pdf f,(,). If X, = (X,4,,..., X,), then the condi- 
tional pdf of X, given X, = x, is f(x,|x,) = f(x)/f,(%,) for values x, such that 
F,(x,) > 0. As an illustration, consider the random variables X,, X,, and X, of 


Si(x1) = 1 


FIGURE 4.4 
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Conditional distribution of probability 


SF (%1,%2) 


Example 4.3.2. The conditional pdf of X3 given (X,, X2) = (x1, X2) is 
I(X1 X2> X3) 
f(%31%1, X2) = > 
cae ie F (X15 X2) 
7 6 
6(1 — x2) 
43, ite 
a aes 


O<x,<x,<%3<1 


and zero otherwise. 
Some properties of conditional pdf’s, which correspond to similar properties of 
conditional probability, are stated in the following theorem. 


If X, and X, are random variables with joint pdf f(x,, x2) and marginal pdf's 
f,(%,) and f,(x2), then 


St) X2) = file F%2 1X1) = fol%2)F%1 | 2) (4.5.4) 
and if X, and X, are independent, then 
f (X%2|%1) = fr{%2) (4.5.5) 


and 
f(%1| %2) = fils) (4.5.6) 
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Other notations often are used for conditional pdf’s. For example, if X and Y 
are jointly distributed random variables, the conditional pdf of Y given X = x 
also could be written as fy,x(y|x) or possibly fy;,(y). In most applications, there 
will be no confusion if we suppress the subscripts and use the simpler notation 
f(y|x). It is also common practice to speak of “conditional random variables” 
denoted by Y|X or Y|X =x. 


Consider the variables X and T of Example 4.4.3. The conditional pdf of T given 
X =x is 
2 pt-2 


fel) =o =p Pe ed eae 


Notice that this means that for any x=1, 2, ..., the conditional pdf of 
U=T —X given X =x is 
fo\x(u|x) = P[T —x =u|X =x] 
= P[T=u+x|X =x] 
= f(u+x|x) 
pa eee 2 8s 


Thus, conditional on X = x, U ~ GEO(p). The conditional pdf of X given T = t 
is 


= pq? 
POND = GT ypigi=? 
1 
— iol 1 x= i, 2; i ed 1 


Thus, conditional on T = t, X ~ DU(t — 1), the discrete uniform distribution on 
the integers 1,2,...,t— 1. 


A piece of flat land is in the form of a right triangle with a southern boundary 
two miles long and eastern boundary one mile long (see Figure 4.5). 

The point at which an airborne seed lands is of interest. The following assump- 
tion will be made: Given that the seed lands within the boundaries, its coordi- 
nates X and Y are “uniformly” distributed over the surface of the triangle. In 
other words, the pair (X, Y) has a constant joint pdf f(x, y) = c > 0 for points 
within the boundaries, and zero otherwise. By property (4.3.4), we have c = 1. 
The marginal pdf’s of X and Y are 


x/2 x 
f0)= | dy == 0<x<2 
0 2 
2 


fi) = | 'éx= 20-9 O0<y<l 


FIGURE 4.5 
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A triangular region 


and zero otherwise. This means that conditional on X = x, Y ~ UNIF(0, x/2). 

One possible interpretation of this result is the following. Suppose we are able 
to observe the x-coordinate but. not the y-coordinate. If we observe that X = x, 
then it is sensible to use this information in assigning probability to events rela- 
tive to Y. Thus, if we observe X = 0.5, then the conditional probability that Y is 
in some interval, say [0.1, 0.7], is 


0.7 
P[0.1<Y <07|X =05]= | f(y|0.5) dy 
0.1 


= 0.6 


where the change in the.upper limit is because f(y|x) = 0 if. x =0.5 and y > 0.25 
(see Figure 4.5). A conditional probability can take this sort.of information into 
account, whereas a marginal probability cannot. For comparison, the marginal 
probability of this event is 


0.7 2 
P[O1<Y<O7] = { | f(x, y) dx dy 
0.1 wJ2y - 
0.7 
-| Soly) dy 
0.1 


0.7 
= | 2(1 — y) dy 
0.1 


= 0.72 
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Mathematically, there is nothing wrong with the marginal probability, but it 
cannot take information about related variables into account. 


In this example, the conditional pdf’s turned out to be uniform, but this is not 
always the case. 


Consider the joint density discussed in Example 4.4.2, 
SQ yY=xty O<x<1,0<y<i 
In this case, for any x between 0.and 1, 


S(x,y) x+y 


TGs 0S. 


I= 


For example, 


025405" 


05 0.25 
PLO < ¥ <05|X =025] = [ eg 
i 
3 


RANDOM SAMPLES 


If X represents a random variable of interest, say the lifetime of a certain type of 
light bulb, then f(x) represents the population probability density function of this 
variable. If one light bulb is selected “at random” from the population and placed 
in operation, then f(x) provides the failure density for that light bulb. This 
process might be described as one trial of an experiment. If one assumes that 
conceptually there is an infinite population of light bulbs of this type, or if the 
actual population of light bulbs is sufficiently large so that the population may be 
assumed to remain essentially the same after a finite number, n, of light bulbs is 
drawn from it, then it would be reasonable to assume that the population prob- 
ability density function also is applicable to the lifetime of the second light bulb 
drawn. Indeed, we could conduct » trials of the experiment, and to distinguish the 
n trials we may let X,; denote the lifetime of the light bulb obtained on the ith 
trial, where each X; ~ f(x,). That is, each X; is distributed according to the 
common parent population density. In addition, if the items are sampled (or the 
trials are conducted) in such a way that the outcome on one trial does not affect 
the probability distribution of the variable on a different trial, then the variables 
may be assumed to be independent. Ordinarily, we will assume that the trials of 
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the experiment or the sampling are conducted in such a way that these two 
conditions are satisfied, and we will refer to this as “random sampling.” 


Definition 4.6.7 


Random Sample The set of random variables X,, ..., X, is said to be a random 


sample of size n from a population with density function f(x) if the joint pdf has the 
form 


SL (43 X25 0029 Xn) = SDL (%2) £m) (4.6.7) 


That is, random sampling assumes that the sample is taken in such a way that 
the random variables for each trial are independent and follow the common 
population density function. In this case, the joint density function is the product 
of the common marginal densities. It also is common practice to refer to the set 
of observed values, or data, x,, x,,.-., x,, obtained from the experiment as a 
random sample. 

In many cases it is necessary to obtain actual observed data from a population 
to help validate.an assumed model or to help select'an appropriate model. If the 
data‘can be assumed to: representa random sample, then equation (4.6.1) pro- 
vides. the connecting: link between: the observed data and the mathematical 
model. 


The lifetime of a certain type of light bulb is assumed to follow an “exponential” 
population density function given by 


S(x) =e * 0<x<0 (4.6.2) 


where the lifetime is measured in years. If a random sample of size two is 
obtained from this population, then we would have 


S(%, %) =e) Ox, < wo (4.6.3) 


Now suppose that the total lifetime of the two light bulbs turned out to be 
x, +x, = 0.5 years. One may. wonder whether this sample result‘is reasonable 
when the population density is given by equation (4.6.2). If not, then it may be 
that a different population model is more appropriate. Questions of this type can 
be answered by using equation (4.6.3). In particular, consider 


c ffc-xx 
P[X, + X2<c] -[ | ie O3t72)-dx, dx, 
0 0 


c c 


=1—ce °—e~ 


For c = 0.5, PLX, + X, < 0.5] = 0.09; thus it would be unlikely to find the total 
lifetime of the two bulbs to be 0.5 years or less, if the true population model is 
given by equation (4.6.2). 
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Specific techniques. for making decisions or. drawing .inferences based on 
sample data will be emphasized in later chapters, and we now are interested 
primarily in developing the mathematical properties that will be needed in carry- 
ing out those statistical procedures. 


EMPIRICAL DISTRIBUTIONS 


It is possible to use some of the work on discrete distributions to study the 
adequacy of specific continuous models. For example, let X,, X,,..., X, be a 
random sample of size n, each distributed as X ~ f(x), where, respectively, f(x) 
and F(x) are the population pdf and CDF. For each real x, let W be the number 
of variables, X;, in. the random:sample that. are less than orequal to x. We can 
regard the occurrence of the event [X; <x] for some i.as “success,” and because 
the variables ina random sample.are independent, W. simply counts the number 
of. successes. on n-independent Bernoulli. trials. with» probability of success 
p = PLX <x]. Thus, W ~ BIN(n, p) with p = F(x). The-relative frequency of a 
success: on-n. trials of the experiment would-be: W/n, which we will denote by 
F(x), referred to-as the empirical CDF. The property of statistical regularity, as 
discussed in Section 1:2, suggests that F,(x) should be close to F(x) for large n. 
Thus, for any proposed model; the corresponding CDF, F(x), should be consis- 
tent with the empirical CDF, F(x), based on data from F(x). 

We now take a set of data x,, x,,..., x, from a random sample of size n from 
f(x), and let-y,< y3<-::.< y, be. the ordered values: of the data.. Then. the 
empirical CDF based on this data can be represented as 


0 x<yy 
i 

F(x) = 25 Vi SX <Visr (4.6.4) 
1 Yn SX 


Consider the following data from a simulated sample of size n = 10 from the 
distribution of Example 2.3.2, where the CDF has the form F(x) = 1 — (1 + x)~?; 
x. >0: 


x;: 0.85, 1.08, 0.35, 3.28, 1.24, 2.58, 0.02, 0.13, 0.22, 0.52 
y;: 0.02, 0.13, 0.22, 0.35, 0.52, 0.85, 1.08, 1.24, 2.58, 3.28 


The graphs of F(x) and F, (x) are shown in Figure 4.6. Although the graph of 
an empirical CDF, F,,(x), is a step function, it generally should be possible to get 
at least.a rough idea of the shape of the corresponding CDF F(x). In this 
example, the sample size n = 10 is probably too small to conclude much, but the 
graphs in Figure 4.6 show a fairly good agreement. 
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FIGURE 4.6 Comparison of a CDF with an empirical CDF 


F(x) 


To select a suitable distribution for the population, it is helpful to have numeri- 
cal estimates for unknown population parameters. For example, suppose it is 
desired to estimate the population mean yw. The mean of the empirical distribu- 
tion, computed from the sample data, is a natural choice as an estimate of y. 
Corresponding to the empirical CDF is a discrete distribution with pdf, say f,(x), 
which assigns the value 1/n at each of the data values x = x,, x2,..., x, and zero 


otherwise. The mean of this distribution is }' x, f,(x) = x,(1/n) + --- + x,(1/n), 
: i=l 


which provides the desired estimate of yu. This estimate, which is simply the arith- 


a 


metic average of the data, is called the sample mean, denoted by x = ¥° x,/n. This 
i= 
rationale also can be used to obtain estimates of other unknown parameter 


values. For example, an estimate of the population variance o? would be the 
n 

variance of the empirical distribution, ¥ (x; — x)?f,(x) = (x, —X)?/n+°°° 
i= 

+ (x, — X)?/n. However, it turns out that such a procedure tends to underesti- 

mate the population variance. This point will be discussed in Chapter 8, where it 

will be shown that the following modified version does not suffer from this 


problem. The sample variance is defined as s? = ¥ (x;—X)?/(n — 1). Another 
i=t 


illustration involves the estimation of a population proportion. For example, 
suppose a study is concerned with whether individuals in a population have been 
exposed to a particular contagious disease. The proportion, say p, who have been 
exposed is another parameter of the population. If n individuals are selected at 
random from the population and y is the number who have been exposed, then 
the sample proportion is defined as p = y/n. This also can be treated as a special 
type of sample mean. For the ith individual in the sample, define x; = 1 if he has 
been exposed and zero otherwise. The data x,, ..., x, correspond to observed 
values of a random sample from a Bernoulli distribution with parameter p, and 
X = (xy +++ + X,)/n = y/n = p. 
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HISTOGRAMS 


It usually is easier to study the distribution of probability in terms of the pdf, 
f(x), rather than the CDF. This leads us to consider a different type of empirical 
distribution, known as a histogram. Although this concept generally is considered 
as a purely descriptive method in lower-level developments of probability and 
statistics, it is possible to provide a rationale in terms of the multinomial distribu- 
tion, which was presented in Section 4.2. 

Suppose the data can be sorted into k disjoint intervals, say 1; = (a;, aj+1]; 
j=1,2,..., k. Then the relative frequency, f;, with which an observation falls 
into I; gives at least a rough indication of what range of values the pdf, f(x), 
might “have over that interval. This can be made more precise by considering 
k +1 events Ey, E,,..., Ex, ;, where E, occurs if and only if some variable, X;, 
from the random sample i is in the interval I, ifj=1,2,...,k and E,,, occurs if 
an X; is not in any J;. If Y, is the number of Variables from the random sample 
that fall into I,, and y= (¥,, Y,,..-, ¥), then ¥Y ~ MULT(n, Pi>-++> Dy), Where 


P) = Faj+1) is F(a) = fr e I(x) dx : (4.6.5) 
‘aj ; 


Again, because of statistical regularity, we would expect the observed relative 
frequency, f; = y,/n, to be close to p; for large n. It usually-is possible to choose 
the intervals so that the probability of E,,, is negligible. Actually, in practice, 
this is accomplished by choosing the intervals after the data have been obtained. 
This is a convenient practice, although theoretically incorrect. 


The following observations represent the observed lifetimes in months of a 
random sample of 40 electrical parts, which have been ordered from smallest to 
largest: 


0.15 2.37 2.90 7,39 7.99 12.05 15.17 17.56 
22.40... 34.84. 35.39. 36.38 39.52. 41.07. 46.50 50.52 
52.54 58.91 58.93. 66.71 7148 71.84 77.66 79.31 
80.90 90.87 91.22 96.35 108.92 112.26 122.71 126.87 

127.05. 137.96 167.59 183.53. 282.49 335.33. 341.19 409.97 


We will use k = nine intervals of length 50, I, = (0, 50], J, = (50, 100], and so 
on. The distribution of the data is summarized in Table 4.3. _ 

It is seen, for example, that the proportion 15/40 = 0.375 of the sample values 
fall below 50 months, so one would expect approximately 37.5% of the total 
population to fall before 50 months, and so on. Of course, the accuracy of these 
estimates or approximations would depend primarily on the sample size n. To 
plot this information so that it directly approximates the population pdf, place a 
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Frequency distribution of 
lifetimes of 40 electrical parts 


Limits of /, f, Fla Height 
0-50 15 0.375 0.0075 
50-100 13 0.325 0.0065 
100-150 6 0.150 0.0030 
150-200 2 0.050 0.0010 
200-250 0 0.000 0.0000 
250-300 j 0.025 0.0005 
300-350 2 0.050 0.0010 
350-400 0 0.000 0.0000 
400-450 1 0.025 0.0005 


rectangle over the interval (0, 50] with area 0.375, a rectangle over the interval 
(50, 100] with area 0.325, and so on. To achieve this, the height of the rectangles 
should be taken as the fraction desired divided by the length of the interval. Thus 
the height of the rectangle over (0, 50] should be 0.375/50 = 0.0075, the height 
over (50, 100] should be 0.325/50 = 0.0065, and so on. This results in Figure 4.7, 
which sometimes is referred to as a modified relative frequency histogram. 


Comparison of an exponential! pdf-with a modified relative frequency histogram 


I) 


A smooth curve through the tops of the rectangles then would provide a direct 
approximation to the population pdf. The number and length of the intervals can 
be adjusted as desired, taking into account such factors as sample size or range of 
the data. Such a decision is purely subjective, however, and there is no universally 
accepted rule for doing this. 
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The shape of the histogram of Figure 4.7 appears to be consistent with an 
exponential pdf, say 


S (x) = (1/100)e~*/19° 0<x<0 
and the graph of this pdf also is shown in the figure. In this case, the CDF is 
F(x) = 1— e779 Qe x <0 


According to this model, P[X <50] = F(50) = 0.393, P[50<X < 100] 
= F(100) — F(50) = 0.239, and so on. These probabilities are given in Table 4.4 
and are compared with the observed frequencies for the sample of size 40. 


Observed and fitted probabilities 


Interval probabilities Cumulative probabilities 

Limits of /, Observed Exponential Observed Exponential 
0-50 0.375 0.393 0.375 0,393 
50-100 0.325 0.239 0.700 0.632 
100-150 0,150 0.145 0.850 0.777 
150-200 0.050 0.088 0.900 0.865 
200-250 0.000 0.053 0.900 0.918 
250-300 0.025 0.032 0.925 0.850 
300-350 0,050 0.020 : 0.975 0.970 
350-400 0.000 0.012 0.975 0.982 
400-450 0.025 0.007 1.000 0.989 


The data in this example were simulated from an EXP(100) model, and the 
discrepancy between the empirical distribution and the true model results from 
the natural “sampling error” involved. As the sample size increases, the histogram 
or empirical distribution should approach the true model. Of course, in practice 
the true model would be unknown. 


Perhaps it should be noted that definition 4:6.1 of a random sample does not 
apply to the case of “random sampling without replacement” from a finite popu- 
lation. The “random sampling” terminology is used in this case to reflect the fact 
that on each trial the elements remaining in the population are equally likely to 
be selected, but the trials are not independent and the usual counting techniques 
discussed. earlier would. apply. here. The. random. sample. terminology also may 
refer here to the idea that-each subset of size n elements of the population is 
equally ‘likely to.be selected as the sample. Definition 4.6.1 is applicable to sam- 
pling from finite populations if the sampling is with replacement (or it would be 
approximately suitable if the population is. quite large). 
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SUMMARY 


The purpose of.this chapter was to further develop the concept of a random 
variable to include experiments that involve two or more numerical responses. 
The joint pdf and CDF provide ways to express the joint probability distribution. 
When the random variables are considered individually, the marginal pdf’s and 
CDFs express their probability distributions. 

When the joint pdf can be factored into the product of the marginal pdf’s, then 
the random variables are independent. If the jointly distributed random variables 
are dependent, then the conditional pdf’s provide a valuable way to express this 
dependence. 

Information about the nature of the true probability model may be obtained 
by conducting n independent trials of an experiment, and obtaining n observed 
values of the random variable of interest. These observations constitute a random 
sample from a real or conceptual population. Useful information about the popu- 
lation distribution can be obtained by descriptive methods such as the empirical 
CDF or a histogram: 


EXERCISES 


For the discrete random variables defined in Exercise 1:of Chapter 3, tabulate: 
(a) the joint pdf of Y and Z. 
(b) the joint pdf.of Zand W. 


. In Exercise 2 of Chapter 3, a game consisted of rolling a die and tossing a coin. If X 

- denotes the number of spots showing on the die plus the number of heads showing on the 
coin, and if Y denotes just the number of spots showing on the die, tabulate the joint pdf 
of X and Y. 


Five cards are drawn without replacement from a regular deck of 52 cards. Let X represent 
the number of aces, Y the number of kings, and Z the number of queens obtained. Give 
the probability of each of the following events: 

(a) A=[X = 2]. 

(b) B=[Y = 2]. 

(c) ANB. 

(d) AV B. 

(e) A given B. 

(f) [X =x]. 

(g) [X <2]. 

(h) [X > 2]. 

Gi) (¥ =2,Y =2,Z = 1). 

(j) Write an expression for the joint pdf of X, Y, and Z. 
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In reference to Example 4.2.1, find the probability for each of the following events. 
(a) Exactly five red and two white. 
(b) Exactly five red and two pink. 


Rework Exercise 3, assuming that the cards were drawn with replacement. 


An ordinary six-sided die is rolled 12 times. If X, is the number of 1’s, X, is the number of 
2’s, and so on, give the probability for each of the following events: 

(a) [X, =2, X, = 3, X3 = 17X,=0,X5 =4, Xe = 2). 

(b) [X, =X, =X3;=X,=X5 = Xe). 

(). [X1 =1,X%2.=2,%3 = 3, X, = 4). 

(d) Write an expression for the joint pdf of X,, X3,and Xs. 


Suppose that X, and X, are discrete random variables with joint pdf of the form 
Sf (X41, ¥2) = (%1.+.%2) x, = 0, 1,2; x, =0, 1,2 
and zero otherwise. Find the-constant c. 


if X and ¥ are discrete random variables with joint pdf 


gxty 


f(x, y)=c x =0,1,2,...; y=0,1,2,..., 


xly! 


and zero otherwise. 
(a) Find the constant c. 
(b) Find the marginal pdf’s of X and_Y. 
(c) Are X and Y independent? Why or why not? 


Let X, and X, be discrete random variables with joint pdf f(x,, x2) given by the following 
table: 


Xe 


1 2 3 


qe aid 4/6 0 
x, 2 {0 1/9 1/5 
3 11/18 1/4 2/15 


(a) Find the marginal pdf’s of X, and X,. 
(b) Are X, and X, independent? Why or why not? 

(c) Find P[X, < 2]. ; 

(d) Find P[X, < X2]. 

(ec) Tabulate the conditional pdf's, f(x. | x,) and f(x,| x2). 


970. Two cards are drawn at random without replacement from an ordinary deck. Let X be the 


number of hearts and Y the number of black cards obtained. 
(a) Write an expression for the joint pdf, f(x, y). 
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(b) Tabulate the joint CDF, F(x, y). 

(c) Find the marginal pdf’s, f,(x) and f,(y). 
(d) Are X and Y independent? 

(e) Find PLY = i1|X = 1]. 

(f) Find P[Y = y|X = 1]. 

(g) Find PLY = y| X = x]. 

(h) Tabulate PLX + Y < z];z=0, 1,2. 


Rework Exercise 10, assuming that the cards are drawn with replacement. 


Consider the function F(x,, x) defined as follows: 


0.25(x, + x2)? if0<x,; <1 and 0<x, <1 
F(x,, X2) =40 if x, <0 or x, <0 
1 otherwise 


Is F(x,, x2) a bivariate CDF? Hint: Check the properties of Theorem 4.2.2. 


Prove Theorem 4.4.2. 


Suppose the joint pdf of lifetimes of a certain part and a spare is given by 
I(x, y=e"™ O<x< a, 0<y<a 


and zero otherwise. Find each of the following: 
(a) The marginal pdf’s, f,(x) and f,(y). 
(b) The joint CDF, F(x, y). 
(c) P[X > 2]. 
(d) P[X < Y]. 
(e) P[X + Y > 2]. 
(f) Are X and Y independent? 


167. 


Suppose X, and X, are the survival times (in days) of two white rats that were subjected 


to different levels of radiation. Assume that X, and X, are independent, 
X,~ PAR(L, 1) and X, ~ PAR(I, 2) 
(a) Give the joint pdf of X, and X,. 


(b) Find the probability that the second rat outlives the first rat, P[X, < X.]. 


Assume that X and Y are independent with X ~ UNIF(—1, 1) and Y ~ UNIF(O, 1). 


Find the probability that the roots of the equation h(t) = 0 are real, where 
A(t) = t? + 2Xt+ Y 


For the random variables X,, X,, and X, of Example 4.3.2: 
(a) Find the marginal pdf f,(x,). 
(b) Find the marginal pdf f,(x,). 
(c) Find the joint pdf of the pair (X,, X,). 
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Consider a pair of continuous random variables X and Y with a joint CDF of the form 


O.5xy(x+y) if0<x<1,0<y<1 

O5x(x+1) if0<x<t1<y 
O5yy+1) ifl<x,0<y<1 
1 ifl<x,1l<y 


F(x, y) = 


and zero otherwise. Find each of the following: 
(a) The joint pdf, f(x, y). 
(b) P[X <.0.5, Y < 0.5]. 
(c) P[X < Y]. 
(d) PIX + Y < 0.5]. 
() PIX + Y <1.5]. 
(f) P[X + Y <2];0<z. 


Let X and Y be continuous random variables with a joint pdf of the form 
I(x, y) = k(x + y) O0<x<y<l 


and zero otherwise. 
(a). Find & so that f(x, y) is a joint pdf. 
(b) Find the marginals, f,(x) and f,(y). 
(c) Find the joint CDF, F(x, y).. 
(d) Find the conditional pdf f(y|x). 
(c) Find the conditional pdf f(x| y). 


Suppose that X and Y have the joint pdf 
f(x, y)=8xy O<x<y<l 


and zero otherwise. Find each of the following: 
(a) The joint CDF F(x, y). 
(b) f0|>). 
(C) f(xly): 
(d) PLX <0.5|¥ = 0.75]. 
(e) P[X <0.5|Y < 0.75]. 


Suppose that X and Y have the joint pdf 
L(% Y= (2/3)(x+1I) O<x<1,0<y<1 


and zero otherwise. Find each of the following: 
(a) fal. 
(b) f20). 
(c) fy|%). 
(d) P[X + Y <1]. 
(e) P[LX <2Y < 3X]. 
(f) Are X and Y independent? 
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Let X,, X2,..., X, denote a random sample from a population with pdf 
f(x) = 3x?; 0 <x <1, and zero otherwise. 


(a) Write down the joint pdf of X;, X2,...,Xy- 
(b) Find the probability that the first observation is less than 0.5, P[X, < 0.5]. 
(c) Find the probability that all of the observations are less than 0.5. 


Rework Exercise 22 if the random sample is from a Weibull population, X; ~ WEI(1, 2). 


The following set of data consists of weight measurements (in ounces) for 60 major league 
baseballs: 


5.09 5.08 5.2L 5.17 5.07. 524 ..5.12°5.16° 5.18 5.19" 
5.26 510. 5.28. 5,29. 5.27. 5.09 5.24. .5.26 5.17. 5.13 
5.27 5.26 5.17 5.19 5.28 528 518° 5.27 5.25 5.26 
5.26 5.18 513 508 5.25 517 5.09 5.16 5.24 5.23 
5.28 5.24 5.23 5.23 527 522 5.26 5.27 5.24 5.27 
5.25 5.28 5.24 526 5.24 5.24 5.27 526 5.22 5.09 


(a) Construct a frequency distribution by sorting the data into five intervals of length 
0.05, starting at 5.05. 

(b) Based on the result of (a), graph a modified relative frequency histogram. 

(c) Construct a table that compares observed and fitted probabilities based on the 
intervals of (a), and for pdf f(x) that is uniform on the interval [5.05, 5.30]. 


For the first 10 observations in Exercise 24, graph the empirical CDF, F,,(x), and also 
graph the CDF, F(x), of a uniform distribution on the interval [5.05, 5.30]. 


Consider the jointly distributed random variables X,, X,, and X, of Example 4.3.2. 
(a) Find the joint pdf of X, and X3. 
(b) Find the joint pdf of X, and X3. 
(c) Find the conditional pdf of X, given (X,, X3) = (X1, 3). 
(d) Find the conditional pdf of X, given (X2, X3) = (x2, X3). 
(e) Find the conditional pdf of (X,, X,) given X3 = x3. 


Suppose X,, X, is a random sample of size n = 2 from a discrete distribution with pdf 
given by f(1) = f(3) = .2 and f(2) = .6. 

(a) Tabulate the values of the joint pdf of X, and X2. 

(b) Tabulate the values of the joint CDF of X, and X,, F(x, x2). 

(c) Find P[X, + X, < 4]. 
Suppose X and Y are continuous random variables with joint pdf f(x, y) = 4(x — xy) if 
0<x<1 and 0<y <1, and zero otherwise. 

(a) Are X and Y independent? Why or why not? 

(b) Find PLX < Y]. 
Suppose X and Y are continuous random variables with joint pdf given by f(x, y) = 24xy 
if0<x, O<y, x + y <1, and zero otherwise. 

(a) Are X and Y independent? Why or why not? 

(b) Find PLY > 2X]. 

(c) Find the marginal pdf of X. 
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30. Suppose X and Y are continuous random variables with joint pdf f(x, y) = 60x?y if 
O0<x, 0<y,x+y <1, and zero otherwise. 
(a) Find the marginal pdf of X. 
(b) Find the conditional pdf of Y given X = x. 
(c) Find P[Y > 1] X = .5]. 
37. Suppose X, and X, are continuous random variables with joint pdf given by S (X41, X2) 
= Ax, + X2) if0 <x, <x, <1, and zero otherwise. 
(a) Find P[X, > 2X,]. 
(b) Find the marginal pdf of X,. 
(c) Find the conditional pdf of X, given X, = X2. 


5.1 


PROPERTIES OF 
RANDOM VARIABLES 


INTRODUCTION 


The use of a random variable and its probability distribution has been discussed 
as a way of expressing a mathematical model for a nondeterministic physical 
phenomenon. The random variable may be associated with some numerical char- 
acteristic of a real or conceptual population of items, and the pdf represents the 
distribution of the population over the possible values of the characteristic. Quite 
often the true population density may be unknown. One possibility in this case is 
to consider a family of density functions indexed by an unknown parameter as a 
possible model, and then concentrate on selecting a value for the parameter. 

A major emphasis in statistics is to develop estimates of unknown parameters 
based on sample data. In some cases a parameter may represent a physically 
meaningful quantity, such as an average or mean value of the population. Thus, 
it is worthwhile to define and study various properties of random variables that 
may be useful in representing and interpreting the original population, as well as 
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useful in estimating or selecting an appropriate model. In some cases, special 
properties of a model (such as the no-memory property of the exponential 
distribution) may be quite helpful in indicating the type of physical assumptions 
that would be consistent with that model, although the implications of a model 
usually are less clear. In such a case, more reliance mux: ‘be placed on basic 
descriptive measures such as the mean and variance of a distribution. In this 
chapter, additional descriptive measures and further properties of random vari- 
ables will be developed. 


PROPERTIES OF EXPECTED VALUES 


Theorem §.2.7 


Theorem §.2.2 


As noted in Chapter 2, it often is necessary to consider the expected value of 
some function of one or more random variables. For example, a study might 
involve a vector of k random variables, X = (X;, X,,..., X,), and we would wish 
to know the expected value of some function of X, say Y = u(X). We could use 
the standard notation E(Y), or another possibility would be E[u(X)], or Ex[u(X)], 
where the subscript emphasizes that the sum or integral used to evaluate this 
expected value is taken relative to the joint pdf of X. The following theorem 
asserts that both approaches yield the same result. 


If X = (X,, ..., X,) has a joint pdf f(x;, -.., x), and if Y =u(X,, ..., X,) isa 
function of X, then E(Y) = E,[u(X,,..., X,)], where 


By X,, 5 XO) = YY ey ig HOS Oa 9 MD (5.2.1) 


x1 Xk 


if X is discrete, and 
Ex(u(X,,..., X,)] =| whee | U(X1, .. 65 XS (X15 -- +5 Xy) dx, ... dX, 


(5.2.2) 
if X is continuous. re 


The proof will be omitted here, but the method of proof will be discussed in 
Chapter 6. We use the results of the theorem to derive some additional properties 
of expected values. 


If X, and X, are random variables with joint pdf f(x,, x,), then 
E(X, + X2) = E(X,) + E(X,) (5.2.3) 


5.2 PROPERTIES OF EXPECTED VALUES 173 


Proof 


Note that the expected value on the left side of equation (5.2.3) is relative to the 
joint pdf of X = (X,, X,), while the terms on the right side could be relative to 
either the joint or the marginal pdf’s. Thus, a more precise statement of equation 
(5.2.3) would be 


EX, + X2) = Ex(X,) + EX.) 
ar Ey (X,) + Ey(X2) 
We will show this for the continuous case: 


E(X, +X) = EX, + X2) 
= [- [Tow + X)f (X41, X2) dx, dx, 
= ik [ix F(X, X2) dx, dx 
+ : 1" X_ f(X1, X2) dx, dx, 
= Ce ec X2) dxz dx, 
+ i X2 ip ST (X14, X2) dx, dx, 


= { X41 Sx,(%1) dx, + X2 fx(X2) x2 


= Ey, (X1) + Ex,{X2) 
= E(X,) + E(X2) 


The discrete case is similar. 


It is possible to combine the preceding theorems to show that if a,, a2, ..., a 
are constants and X,, X,,..., X;,are jointly distributed random variables, then 


k k 
o( > a; x,) = ¥: a; E(X;) (5.2.4) 
t=1 i=1i . 


Another commonly encountered function of random variables is the product. 


Theorem 5.2.3 \f X and Y are independent random variables and g(x) and hA(y) are functions, 
then 


Elg(X)A(Y)] = Elg(X)ETA(Y)] (5.2.5) 
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Proof 


In the continuous case, 


ecaconrn = |” |” acomonses, ») dx ay 


a [ (" Ax)h(y) f1(%0) foly) dx dy 


-| [aeanicn ax] [" rore0 o 


= Efg(X)JETH(Y)] 


It is possible to generalize this theorem to more than two variables. Specifi- 


cally, if X;,..., X;, are independent random variables, and Ux (x4), ..., uz,(x,) are 
functions, then 
E(u,(X,)°+- u(X,)) = Elu,(X,)] --- El[u(X,)] (5.2.6) 


Certain expected values provide information about the relationship between 
two variables. 


Dafinition 5.2.7 
The covariance of a pair of random variables X and Y is defined by 
Cov (X, Y) = E[(X — ux)(¥ — py)] (5.2.7) 


Another common notation for covariance is oxy. 


Some properties that are useful in dealing with covariances are given in the 
following theorems. 


Theorem §.2.4 (fi X and Y are random variables and a and 6 are constants, then 


Cov(aX, bY) = ab Cov(X, Y) (5.2.8) 

Cov(X + a, Y + b) = Cov(X, Y) (5.2.9) 

Cov(X, aX + b) = a Var(X) (5.2.10) 
Proof 


See Exercise 26. 


Theorem §.2.5 


Theorem §.2.6 
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If X and Y are random variables, then 
Cov(X, Y) = E(XY) — E(X)E(Y) (5.2.11) 
and Cov(X, Y) = 0 whenever X and Y are independent. 


Proof 


See Exercise 27. a 


Theorem 5.2.2 dealt with the expected value of a sum of two random variables. 
The following theorem deals with the variance of a sum. 


If X; and -X, are random variables with joint pdf f(x,, x2), then 

Var(X, +X) = Var(X,) + Var(X2) +2 Cov(X,, X2) (5.2.12) 
and 

Var(X, + X>) = Var(X,) + Var(X,) (6.2.13) 


whenever X, and X, are independent. 


Proof 
For convenience, denote the expected values of X, and X, by yw, = E(X)); 
pa? : 
Var(X, + X2) = E[(X; + X2) — (4, + 2)]? 
= E[(X, — #1) + (X2 — we)? 
= E[(X,— 41)7) + El(X2 — 42)7] 
+ 2E((X 1 — w1(X2 — 42)] 
= Var(X,) + Var(X2) +.2 Cov(X,, X) 
which establishes equation (5.2.12). Equation (5.2.13) follows from Theorem 5.2.5. 


It also can be verified that if X,,..., X,, are random variables and a,,..., a, are 
constants, then 


k k 
Var( >. a; x,) = 3 a? Var(X,) +2 y > a; a; Cov(X;, X;) (5.2.14) 
i=1 i=1 i<j 
and if X,,..., X;, are independent, then 


k k 
Var{ aiX,) = )\ a} Var(X) (5.2.15) 
i= i=1 


i 
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Example 5.2.1. Suppose that Y ~ BIN(n, p). Because binomial random variables represent ‘the 


number of successes in n independent trials of a Bernoulli experiment, Y is distri- 
buied as a sum, Y = )’ X;, of independent Bernoulli variables, X ; ~ BIN(1, p). 
ist 


It follows from equations (5.2.4) and (5.2.15) that the mean and variance of Y are 
E(Y) = np and Var(Y) = npg, because the mean and variance of a Bernoulli vari- 
able are E(X;,) = p and Var(X;) = pq. This is somewhat easier than the approach 


used in Chapter 3. 


| Example 5.2.2 The approach of the previous examples also can be used if Y ~ HYP(n, M, N), 


but the derivation is more difficult because draws are dependent if sampling is 
done without replacement. For example, suppose that a set of N components 
contains M defective ones, and n components are drawn at random without 
replacement. If X; represents the number of defective components (either 0 or a) 
obtained on the ith draw, then X,, X,,..., X,, are dependent Bernoulli variables. 
Consider the pair (X,, X,). It is not difficult to see that the marginal distribution 
of the first variable is X, ~ BIN(1, p,) where p, = M/N, and conditional on 


X,=x,, X, ~ BING, a. where p. = (M — x,)/(N — 1). Thus, the joint pdf of. 


(X,, X) is 
Sf (%1, 2) = py'gi- ppgi- “2 x; = 0,1 (5.2.16) 
fromm which we can obtain the covariance, 
M M 1 
Cov(X,, X3) = — v7 (: = #) Vo1 (5.2.17) 


Actually, it can be shown that for any pair (X;, X j) with i # j, Cov(X;, X;) is 
given by equation (5.2.17), and for any i, 


M 
E(X;) = W (5.2.18) 
and 
M M 
Var(X,;) = W (: — i) (5.2.19) 
It follows from equations (5.2.4) and (5.2.14) that 
E(Y)=n uM (5.2.20) 


Var(Y) = x (1 _ MRE) (5.2.21) 


ayn) 
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In the case of equation (5.2.21), there are n terms of the form (5.2.1) and 
n(n — 1) of the form (5.2.17), and the result follows after simplification. 

Note that the mean and variance of the hypergeometric distribution have 
forms similar to the mean and variance.of the binomial, with p replaced by M/N, 
except for the last factor in the variance, (N — n)/(N — 1), which is referred to as 
the finite multiplier term. Because the hypergeometric random variable, Y, rep- 
resents the number of defects obtained when sampling from a finite population 
without replacement, it is clear that Var(Y) must approach zero as n approaches 
N. That.is, Y = M when n = N, with variance zero, whereas this effect does not 
occur for the binomial case, which corresponds to sampling with replacement. 


APPROXIMATE MEAN AND VARIANCE 


Chapter 2 discussed a method for approximating the mean and variance of a 
function of a random variable X. Similar results can be developed for a function 
of more than one variable. For example, consider a pair of random variables 
(X, Y) with means y, and y, variances of and o3, and covariance o,; further 
suppose that the function H(x, y) has partial derivatives in an open rectangle 
containing {u;, 3). Using Taylor approximations, we obtain the following 
approximate formulas for the mean and variance of H(X, Y): 
2 62 
BHU, YY] & Hy, bs) + 25 of +55 0} 


_(2HY 2, (GH) » , »(dH\( aH 
Var[H(X, Y)] = (2) ort (=) AF (FF 


where the partial derivatives are evaluated at the means (/4;, 2). 


CORRELATION 


Exemple 5.3.7 


The importance of the mean and variance in characterizing the distribution of a 
random variable was discussed earlier, and the covariance was described as a 
useful measure of dependence between two random variables. 

It was shown in Theorem 5.2.5 that Cov(X, Y) =0 whenever X and Y are 
independent. The converse, in general, is not true. 


Consider a pair of discrete random variables X and Y with joint pdf f(x, y) = 1/4- 
if (x, y) = (0, 1), (1, 0), (0, —1), or (—1, 0). The marginal pdf of X is fy(+ 1) = 1/4, 
f(0) = 1/2, and f,(x) = 0 otherwise. The pdf of Y is similar. Thus, E(X) = — 1(1/4) 
+ O(1/2) + 1(1/4) = 0. Because xy =0 whenever f(x, y)> 0, it follows that 
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E(XY)=0. Consequently, Cov(X, Y)= E(XY)— E(X)E(Y)=0. However, 
f(0, 0) = 0 # f,(0)f,(), so X and Y are dependent. In general, we can conclude 
that X and Y are dependent if Cov(X, Y) 4 0, but Cov(X, Y) = 0 does not neces- 
sarily imply that X and Y are independent. 


Definition §.3.1 


If X and Y are random variables with variances o3 and o} and covariance oxy 
= Cov(X, Y), then the correlation coefficient of X and Y is 


= OXY (5.3.1) 
Ox Oy 


The random variables X and Y are said to be uncorrelated if p = 0; otherwise 
they are said to be correlated. 

A-subscripted notation, py, also-is sometimes used, The following theorem 
gives some important properties of the correlation coefficient: 


if p is the correlation coefficient of X and Y, then 


~l<p<i (5.3.2) 
and : 

p= +1 if and only if Y = aX + b with probability 1 for some 

a#0Oandb (5.3.3) 
Proof 


For convenience we will use the simplified notation y, =p, “, = My, @; 
= 0x,0, = dy, and a1, =Gyy. 
To show equation (5.3.2), let 


Y ».¢ 
W=—--p— 
G2 Gy 
so that 
2 o 
vari) = (+) a+(4) a? — 2p —? 
2 1 G,G2 
=1+p?—2p? 
—4 —p’?>0 


because Var(W) > 0. 


Example §.3.2 


FIGURE 5.7 
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To show (5.3.3), notice that p = +1 implies Var(W) = 0, which by Theorem 
2.4.7 implies that P[W = wy] = 1, so that with probability 1, Y/o, — Xp/c, 
= M2/o, — Uy p/oy, or Y = aX + b where a =po,/o, andb = p, — Hr 09/01, On 
the other hand, if ¥ = aX + b, then by Theorems 2.4.3 and 5.2.4, ¢? = a*a? and 
042 = 403, in which case p = a/|a|,sothatp =1lifa>Oandp=—lifa<0. B 


Consider random variables X and Y with joint pdf of the form f(x, = 1/20 if 
(x, y) € C, and zero otherwise, where 


C= {(x, ylJO<x<10,x-1L<y<x+I1] 


which is shown in Figure 5.1. 


Region corresponding to0 <x <10andx-1<y<x+i 


Although Y is not a function of X, the joint distribution of X and Y is “clus- 
tered” around the line y = x, and we should expect p to be near 1. 

The variance of X is o7 =(10)?/12 = 25/3, the variance of Y is o3 
= (10)?/12 + (2)?/12 = 26/3, and the covariance is o,, = E(XY) — E(X)E(Y) 
= 25/3. Thus, the correlation coefficient is 

25/3 


0 1353/63 


which, as expected, is close to. 1. 


25/26 = 0.981 
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CONDITIONAL EXPECTATION 


Example §.4.7 


It is possible to extend the notions of expectation and variance to the conditional 
framework. 


Definition §.4.7 
If X and Y are jointly distributed random variables, then the conditional expectation 
of Y given X = x is given by 


E(Y|x)=¥ yf(y|x) if X and Y are discrete (5.4.1) 
¥ 


E(Y |x) -( yf(y|x) dy if X and Y are continuous (5.4.2) 


Other common notations for conditional expectation are Ey),(Y) and 
E(Y |X = x). 


Consider Example 4.5.2, where ‘a certain airborne particle lands at a random 
point (X, Y) on a triangular region. For this example, the conditional pdf of Y 
given X = x is 


2 x 
S¥ly=-= O<y<t 
x 2 


The conditional expectation is 


xf/2 2 
Hr I9 = | (2) dy 


_ 2px(x/2? 
2 


= 0<x<2 

If we are trying to. “predict” the value of the vertical coordinate of (X, Y), then 
E(Y |x) should be more useful than the marginal expectation, E(Y), because it 
uses information about the horizontal coordinate. Of course, this assumes that 
such information is available. 

Notice that the conditional expectation of Y given X = x is a function of 

x, say u(x) = E(Y|x). The following theorem says that, in general, the random 
variable u(X) = E(Y | X) has the marginal expectation of Y, E(Y). 


Theorem §.4.7 


Example §.4.2 


Theorem 5.4.2 
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If X and Y are jointly distributed random variables, then 


E[E(Y | X)] = E(Y) (5.4.3) 


Proof 


Consider the continuous case: 


ELE(Y|X)] = [ E(Y | x) f,(x) dx 


=| [forts ay ax 


= { y ii F(x, y) dx dy 


= [; yhr(y) dy 


= E(Y) Eg 


In the previous.example, suppose that we wish to know the marginal expecta- 
tion, E(Y). One possibility would be to use the previous theorem, with E(Y | x) 
= x/4 and f,(x) = x/2 for 0 <x < 2: Specifically, 


2 fx\{x 1 
E(Y) = ETE(Y | X)] = —}(—|dx=- 
ie ee 


Suppose the number of misspelled words in a student’s term paper is a Poisson- 
distributed random variable X with mean E(X) = 20. The student’s roommate is 
asked to proofread the paper in hopes of finding the spelling errors. On average, 
the roommate finds 85%. of such errors, and if x errors are present in the paper, 
then it is reasonable to consider the number of spelling errors found to be a 
binomial variable with parameters n = x and p = .85. In other words, condition- 
al on X = x, Y ~ BIN(x, .85), and because, in general, the mean of a binomial 
distribution is np, the conditional expectation is E(Y|x) = .85x. Thus, the 
expected number of spelling errors that are found by the roommate would be 
E(Y) = ELE(Y | X)] = E(.85X) = 85E(X) = 17. 


An interesting situation occurs when X and Y are independent. 


If .X and... Y. are. independent random. variables, then E(Y|x) = E(Y) 
and E(X | y) = E(X). 
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Proof 


If X and Y are independent, then f(x, y) = f,(x)f2(y), so that f(y|x) = f(y) and 
St (xl y) = f,(x). In the continuous case 


E(Y |x) = | yf(y|x) dy 
= i yfoly) dy 
= E(Y) 
The discrete case is similar. a 


It also is useful to study the variance of a conditional distribution, which 
usually is referred to as the conditional variance. 


Definition 5.4.2 
The conditional variance of Y given X =x is given by 


Var(Y |x) = E{LY ~ E(Y|x)}* |x} (5.4.4) 


An equivalent form, which is analogous to equation (2.4.8), is 
Var(Y |x) = E(Y¥?|x) — [E(Y|x)}? (5.4.8) 


If X and Y are jointly distributed random variables, then 
Var(Y) = E,[Var(Y | X)] + Var,[E(¥ | X)] (5.4.6) 


Proof 
Ey[Var(¥ | X)] = Ex{E(¥?|X) — [E(¥| X)7°} 
= {E(Y?) — E,[E(¥|X)}} 
= E(Y’) —[E(Y)]? — {ExLE(Y |X)? — [E(Y)]} 
= Var(Y) — Var,[E(Y | X)] 


This theorem indicates that, on the average (over X), the conditional variance 
will be smaller than the unconditional variance. Of course, they would be equal if 
X and Y are independent, because E(Y|X) then would not be a function of X, 
and Var[E(Y|X)] would be zero. If one is interested in estimating the mean 
height of an individual, E(Y), then the theorem suggests that it might be easier to 
estimate the individual’s (conditional) height if you know the person’s weight, 
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because the unconditional population of heights would have greater variance 
than the reduced population of individuals all with fixed weight, x. This fact leads 
to the important area of regression analysis, where information about one vari- 
able is used to aid in understanding a related variable. 

An argument similar to that of Theorem 5.4.1. yields the following, more 
general, theorem. 


Theorem §.4.4 If X and Y are jointly distributed random variables and A(x, y) is a function, then 


E(h(X, Y)] = Ex{E[A(X, Y)| XJ} (5.4.7) 
i 


This theorem says that a joint expectation, such as the left side of equation 
(5.4.7), can be solved by-first finding the conditional expectation E[h(x, Y)|x], 
and then finding its expectation relative to the marginal distribution of X. This 
theorem is very useful in certain applications, when used in conjunction with the 
following theorem, which is stated without proof. 


Theorem §.4.5 If X and Y are jointly distributed random variables, and g(x) is a function, then 


E[g(X)¥ |x] = g)E(Y |x) (5.4.8) 


| Example 5.4.3 f (X, Y)~ MULT(n, p,, p2), then by straightforward derivation it follows that 
X ~ BIN(®, p,), Y ~ BIN(n, p), and conditional on X = x, Y|x ~ BIN(n — x, p), 
where p = p,/(1 — p,). The means and variances of X and Y follow from the 
results of Section 5.2. Note also that E(Y |x) = (n — x)p2/(1 — p). 

The two previous theorems can be used to find the covariance of X and Y. 
Specifically, 


E(XY) = ELE(XY|X)] 
= E[XE(Y|X)] 


a A = 2 
1—p, 


és 
= F 2 linea — E(X?)] 


If we substitute E(X) = np, and E(X*) = Var(X) + (np,)? = np,[1 + (n — 1)p,] 
and simplify, then the result is 


E(XY) = n(n — 1p; Pa 
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Thus, 
Cov(X, Y) = n(n — 1)p, p2 — (np,)(np2) 
= —NP; Pa 
As in Theorem 5.3.1, we adopt the convenient notation yw, = E(X), wu, = E(Y), 


o? = Var(X), and o3 = Var(Y). 
If E(Y | x) is a linear function of x, then 


E(Y|x) =", +p oa (x — 1) (5.4.8) 


Ci. 
and 

E,[Var(¥ | X)] = o3(1 — p?) (5.4.10) 
Proof 
Consider (5.4.9). If E(Y |x) = ax + b, then 


Hz = E(Y) = Ex[E(Y| X)] = Ex(aX + b) = ay, + b 


and 
Oxy = E[(X — wy — p2)] 
= E[((X — 4)Y]—0 
= Ex{E[(X — 4,)¥|X}} 
= Ex[(X — w)E(Y|X)] 
= Ex[(X — p,)(aX + d)] 
= ao? 
Thus, 
ae and basa —p my 


Equation (5.4.10) follows from Theorem 5.4.3, 


E,[Var(Y | X)] = Var(Y) — Vara as +p (X — ua] 


easy} 

050 
= Var(Y) — p? 4+ 
O71 


= o3(1 — p?) 


Theorem 5.4.7 
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Note that if the conditional variance does not depend on x, then 
Var(Y |x) = Ey [Var(¥| X)] = 02(1 — p’) 
Thus, the amount of decrease in the variance of the conditional population 
compared to the unconditional population depends on the correlation, p, 
between the variables. 


An important case will now be discussed in which E(Y |x) is a linear function 
of x and Var(¥| x) does not depend on x. 


BIVARIATE NORMAL DISTRIBUTION 


A pair of continuous random variables X and Y is said to have a bivariate 
norma! distribution if it has a joint pdf of the form 


1 
{6)=—== 


2ng,0,./1—p 


i x—p,\ x—Mi\(/Y¥— be 
xe | o5|( Oy ) - 2of ay \( 62 ) 


Y—-}2)’ 
+(2>4) |}-« <x<0, -w<y<o (5.4.11) 


A special notation for this is 
(x, Y) ~ BYN(,, #2» oi, 03; p) (8.4.12) 


which depends on five parameters, —c0 <py< 0, —-~O<p,< 0, 0, >0, 
o,>0,and —L<p<ti. 

The following theorem says that the marginal pdf’s are normal and the nota- 
tion used for the parameters is appropriate. 


If (Xx, Y) iad BYN(4, H2> ot; 03, p), then 
X ~ Nu, 07) and Y~ N(u2, 03) 


and p is the correlation coefficient of X and Y. 


Proof 
See Exercise 23. : B 


Strictly speaking, first we should have established that 


ii ii I(x, y) dx dy =1 


but this would follow from Theorem 5.4.7. 
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It was noted in Section 5.2 that independent random variables are uncor- 
related. In other words, if X and Y are independent, then p = 0. Notice that the 
above joint pdf, (5.4.11), factors into the product of the marginal pdf’s if p = 0. 
Thus, for bivariate normal variables, the terms “uncorrelated” and “independent” 
can be used interchangeably, although this is not true for general bivariate dis- 
tributions. 


Theorem 5.4.8 If(X, Y) ~ BVN(u,, 45, 07, 03, p), then 


1. conditional on X = x, 
o2 2 2 
Yix~Niw.+p 5, ee o3(1 — p*) 
1 
2. conditional on Y = y, 


Co 
X|y~ M as +p a (y — p12), o2(1 -»)| 
2 


Proof 


See. Exercise 24. 


This theorem shows that both conditional expectations are linear functions of 
the conditional variable, and both conditional variances are constant. If p is close 
to zero, then the conditional variance is close to the marginal variance; and if p is 
close to +1, then the conditional variance is close to zero: 

As noted earlier in the chapter, a conditional expectation, say E(Y |x), is a 
function of x that, when applied to X, has the same expected value as Y. This 
function sometimes is referred to as a regression function, and the graph of 
E(Y |x) is called the regression curve of Y on X = x. The previous theorem asserts 
that for bivariate normal variables we have a linear regression function. It follows 
from Theorem 5.4.3 that the variance of E(Y |X) is less than or equal to that of 
Y, and thus the regression fun¢tion, in general, should be a better estimator of 
Hz = E(Y) than is Y itself. Of course, this could be explained by the fact that the 
marginal distribution of Y makes no use of information about X, whereas the 
conditional distribution of Y given X = x does. 


Be) 


JOINT MOMENT GENERATING FUNCTIONS 


The moment generating function concept can be generalized to k-dimensional 
random variables. 


Theorem §.8.9 


Example 5.5.7 
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Definition 5.5.1 
The joint MGF of X= (X,,..., X,), if it exists, is defined to be 


Mx) =E exw (S+xa)| . (5.5.1) 
t=1 : 


where ¢ = (t;,...,¢,) and —h<t,<hforsomeh > 0. 


The joint MGF has properties analogous to those of the univariate MGF. 
Mixed moments such as E[X}X$] may be obtained by differentiating the joint 
MGF r times with respect to t; and s times with respect to t; and then setting all 
t; = 0. The joint MGF also uniquely determines the joint distribution of the vari- 
ables X,,..:, Xx. 

Note that it also is possible to obtain the MGF of the marginal distributions 
from the joint MGF. For example, ; 


My(t) = Mx, rts, 9) (5.5.2) 
My{t2) = Mx, (0, t2) (5.5.3) 


If My y(ty, t,) exists, then the random variables X and Y are independent if and 
only if 


My, v(t, t2) = Mxlt,)M y(t) 


Suppose that 
X=(X,,..., X,) ~ MULT(r, py, ...; Pd) 


We have discused earlier that the marginal distributions are binomial, X; 
~ BIN(®, p)). 

The joint MGF of the multinomial distribution may be evaluated along the 
lines followed for the binomial distribution to obtain 


M,(t) = El exp ( > t;: X ‘] ; 
i=1 


n 
= SS fo ¥ a (p, ey -- (p, et)*pett} 
X4: eee et. Xe+1: 
= (ppe tos + ppe™ + Ps)” (5.5.4) 


where Pyy1 = 1 — py —*** — Px. 
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Clearly, the joint marginal distributions also are multinomial. For example, if 
(X,, Xx, ) X;) ite MULT(n, Pi, P2> Ps), then 


Myx,, xlta te) = Mx, x2, xy(t, t2, 9) 
= [pi e" + p2e?.+ ps +.(1 = Pi — P2 — Pa) ]” 
= [pre + p.e? +1— py — pp)" 
so 


(X,, X) as MULT(n, Pr; P2) 


Consider a pair of bivariate normal-random. variables with means yu, and p,, 
variances o7 and o3, and correlation coefficient p. In other words, (X, Y) 
~ BYN(y, db; oi, 03, p). The joint MGF of X and Y can be evaluated directly 
by integration: {®,, f®., exp (t,x +t, y)f(x, y) dx dy with f(x, y) given by equa- 
tion (5.4.11). The direct approach-is somewhat: tedious, so.we will make use of 
some of the results on conditional expectations. Specifically, from Theorems 5.4.4 
and 5.4.5, it follows that 
My, v(ti, t2) = Elexp (t, X + t, Y)] 
= E,{Efexp (t, X + t, Y)| X]} 
= Ex{exp (t, X)E [exp (t, Y)| X]} (5.5.5) 


Furthermore, by Theorem 5.4.8 


o: 
Y|x~ Mas +p 2 (x — Hy), 03(1 -»)| 
1 


’ so that 


Efexp (t, Y)|X =x] = exp {[a: +p ot (x = 1) + o3(1 — pra/2} 


After substitution into equation (5.5.5) and some simplification, we obtain 


My, y(t, te) = exp [4 ty + Met, + (070? + 03 13 + 2po, 02 ty t)] (5.5.6) 


SUMMARY 


The main purpose of this chapter was to develop general properties involving 
both expected values of and functions of random variables. Sums and products 
are important functions of random variables that are given special attention. For 
example, it is shown that the expected value of a sum is the sum of the (marginal) 
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expected values. If the random variables are independent, then the expected value 
of a product is the product of the (marginal) expected values, and the variance of 
a sum is the sum of the (marginal) variances. 

The correlation coefficient provides a measure of dependence between two 
random variables. When the correlation coefficient is zero, the random variables 
are said to be uncorrelated. For two random variables to be independent, it is 
necessary, but not sufficient, that they be uncorrelated. When the random vari- 
ables are dependent, the conditional expectation is useful in attempting to predict 
the value of one variable given an observed value of the other variable. 


EXERCISES 


Let X,, X,, X3, and X, be independent random variables, each having the same 
distribution with mean 5 and standard deviation 3, and let Y= X, + 2X, +X ,—X,. 

(a) Find E(Y). 

(b) Find Var(Y). 
Suppose the weight (in ounces) of a major league baseball is a random variable X with 
mean yz = 5.and standard deviation ¢ = 2/5. A carton contains 144 baseballs. Assume that 
the weights of individual baseballs are independent, and jet T represent the total weight of 
all the baseballs in the carton. 

(a) Find the expected total weight, E(T). 

(b) Find the variance, Var(T). 


Suppose X and Y are continuous random variables with joint pdf f(x, y) = 24xy_ if 
0<x, O<y, and x + y-< 1, and zero otherwise. 

(a) Find E(X Y). 

(b) Find the covariance of X and Y. 

(c) Find the correlation coefficient of X and Y. 

(d) Find Cov(3X, 5Y). 

(e) Find Cov(X + 1, Y — 2). 

(f) Find Cov(X + 1, SY — 2). 

(g) Find Cov(3X + 5, X). 


Let X and Y be discrete random variables with joint pdf f(x, y) = 4/(5xy) if 
x=1,2 and y = 2,3, and zero otherwise. Find: 

(a) E(X). 
(b) E(Y). 

(c) E(XY). 

(d) Cov(X, Y). 

Let X and Y be continuous random variables with joint pdf f(x, y) = x + y if 
0<x<i1 and 0<y <4, and zero otherwise. Find: 


(a) E(x). 
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(b) E(X +. ¥). 
(c) E(XY). 

(d) Cov(2X, 3Y). 
(e) E(Y |x). 


6. If X, Y, Z, and W are random variables, then show that: 
(a) Cov(X +.Y, Z) = Cov(X, Z) + Cov(Y, Z). 
(b) Cov(X + Y, Z + W) = Cov(X, Z) + Cov(X, W) + Cov(Y,Z) + Cov(Y, W). 
(c) Cov(X +.Y, X — Y) = Var(X) — Var(Y). 


7. Suppose X and Y are independent random variables with E(X) = 2, E(Y) = 3, 
Var(X) = 4, and Var(Y) = 16. 
(a) Find E(5X — Y). 
(b) Find Var(5X — Y). 
(c) Find Cov(3X + Y, Y). 
(d) Find Cov(X, 5X — Y). 


8. WfX,, X,,...,X, and Y¥,, ¥%,..., ¥,, are jointly distributed random variables, and if 


A;,4,,...,4, and b,,b,,..., b,, are constants, show that 
k m k m 
cor 5 a,X, by) = d ¥a,b, Cov(X,, ¥) 
i=i j=i -i=1 f=1 


9. Use the result of Exercise 8 to verify equations (5.2.14) and (5.2.15). 


70. For the random variables in Exercise 5, find the approximate mean and variance of 
W=XY. 


77. Let f(x, y) = 6x; 0<x < y <1, and zero otherwise. Find: 
(a) fC). 
(b) fay). 
(c) Cov(X, Y). 
(d) p. 
(e) f(y}. 
(f) E(Y |x). 


72. Suppose X and Y are continuous random variables with joint pdf f(x, y) = 4(x — xy) if 
0<x<i1 and 0<y <1, and zero otherwise. 


(a) Find E(X?Y). 

(b) Find E(X — Y). 

(c) Find Var(X — Y). 

(d) What is the correlation coefficient of X and Y? 
(e) What is E(Y |x)? 


73. Let f(x, y) = Lif0< y< 2x, 0< x <1, and zero otherwise. Find: 
(a) f(y! x). 


74, 


75. 


16. 


47. 


78. 


19. 


20. 


27. 


22. 


23. 


24, 
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(b) E(Y¥ |x). 
(c) p. 


For the joint pdf of Exercise 30 in Chapter 4 (page 170): 
(a) Find the correlation coefficient of X and Y. 
(b) Find the conditional expectation, E(Y | x). 
(c) Find the conditional variance, Var(Y | x). 


(a) Determine E(Y | x) in Exercise 4. 
(b) Determine Var(Y | x) in Exercise 4. 


Let X and Y have joint pdf f(x, y) = e7” if0 < x < y < oo and zero otherwise. Find 
E(X | y). 


Suppose that the conditional distribution of Y given X = x is Poisson with mean 
E(Y |x) =x, Y|x ~ POI(x), and that X ~ EXP(1). 

(a) Find E(Y). 

(b) Find Var(Y). 
One box contains five red and six black marbles. A second box contains 10 red and five 
biack marbles. One marble is drawn from box 1 and placed in box 2. Two marbles then 
are drawn from box 2 without replacement. What is the expected number of red marbles 
obtained on the second draw? 


The number of times a batter gets to bat in a game follows a binomial distribution 
N ~ BIN(6, 0.8). Given the number of times at bat, n, that the batter has, the number of 
hits he gets conditionally follows a binomial distribution, X |n ~ BIN(n, 0.3). 

(a) Find E(X). 

(b) Find Var(X). 

(c) Find E(X?). 


Let X be the number of customers arriving in a given minute at the drive-up window of a 
local bank, and let Y be the number who make withdrawals. Assume that X is Poisson 
distributed with expected value E(X) = 3, and that the conditional expectation and 
variance of Y given X = x are E(Y |x) = x/2 and Var(Y |x) = (x + 1)/3. 

(a) Find E(Y). 

(b) Var(Y). 

(c) Find E(X Y). 
Suppose that Y, and Y, are continuous with joint pdf f(y,, y.) = 2e7"!~” if 
0<y, < y, < © and zero otherwise. Derive the joint MGF of Y, and Y,. 


Find the joint MGF of the continuous random variables X and Y with joint pdf 
I(x, y) =e” if 0 <x < y< oo and zero otherwise. 


Prove Theorem 5.4.7. Hint: Use the joint MGF of Example 5.5.2. 


Prove Theorem 5.4.8. 
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25. Let X, and X, be independent normal random variables, X; ~ N(u;, 07), and let Y, = X, 
and Y,=X,4+X. 
(a) Show that Y, and Y, are bivariate normal. 
(b) What are the means, variances, and correlation coefficient of Y, and Y,? 
(c) Find the conditional distribution of Y, given Y, = y,. Hint: Use Theorem 5.4.8. 


26. Prove Theorem 5.2.4. 
27, Prove Theorem §.2.5. 


6.1 


FUNCTIONS OF 
RANDOM VARIABLES 


INTRODUCTION 


In Chapter 1, probability was defined first in a set theoretic framework. The 
concept of a random variable then was introduced so that events could be associ- 
ated with sets of real numbers in the range space of the random variable. This 
makes it possible to mathematically express the probability model for the popu- 
lation or characteristic of interest in the form of a pdf or a CDF for the associ- 
ated random variable, say X. In this case, X represents the initial characteristic of 
interest, and the pdf, f,(x), may be referred to as the population pdf. It often may 
be the case that some function of this variable also is of interest. Thus, if X 
represents the age in weeks of some component, another experimenter may be 
expressing the age, Y, in days, so that Y = 7X. Similarly, W =1n X or some 
other function of X may be of interest. Any function of a random variable X is 
itself a random variable, and the probability distribution of a function of X is 
determined by the probability distribution of X. For example, for Y above, 
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P[14 < Y < 21] = P[2 < X < 3], and so on. Clearly, probabilities concerning 
functions of a random variable may be of interest, and it is useful to be able to 
express the pdf or CDF of a function of a random variable in terms of the pdf or 
CDF of the original variable. Such pdf’s sometimes are referred to as “derived” 
distributions. Of course, a certain pdf may represent a population pdf in one 
application, but correspond to a derived distribution in a different application. 

In Example 4.6.1, the total lifetime, X, + X,, of two light bulbs was of interest, 
and a method of deriving the distribution of this variable was suggested. General 
techniques for deriving the pdf of a function of random variables will be discussed 
in this chapter. 


THE CDF TECHNIQUE 


Example 6.2.7 


We will assume that a random variable X has CDF F(x), and that some func- 
tion of X is of interest, say Y = u(X). The idea behind the CDF technique is to 
express the CDF of Y in terms of the distribution of X. Specifically, for each real 
y, we can define a set A, = {x| u(x) < y}. It follows that [Y < y] and [X e€ A,] 
are equivalent events, in the sense discussed in Section 2.1, and consequently 


Fy(y) = P[u(X) < y] (6.2.1) 


which also can be expressed as P[X € A,]. This probability can be expressed as 
the integral of the pdf, f(x), over the set A, if X is continuous, or the summation 
of fx(x) over x in A, if X is discrete. 

For example, it often is possible to express [u(X) < y] in terms of an equivalent 
event [x, < X < x2], where one or both of the limits x, and x, depend on y. 

In the continuous case, 


Fy(y) = i * fale) dx 


= Fy(x2) — F(x) (6.2.2) 
and, of course, the pdf is f}(y) = (d/dy)F y(y). 


Suppose that Fy(x) = 1 — e7?*,0 <x < oo, and consider Y = e*. We have 
Fy(y) = PLY < y] 
= P[e* <y] 
= P[X <In y] 
= Fy(in y) 


=j—y? l<y<o 
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In this case, x, = —oco and x, = In y, and the pdf of Y is 


d 
I) = Gy Fel) = 2" l<y<o 


Consider a continuous random variable X, and let Y = X?. It follows that 
Fy(y) = P[X? < y] 
=P[-Jy<X< Vy] 
= Fy(./y) — Fx(—,/9) (6.2.3) 


The pdf of Y can be expressed easily in terms of the pdf of X in this case, 
because 


4 
fi) =F [Fx(/¥) — Fx(—J/ 9] 
d 
= fd 5 V9 fal-V9) ; (-/y) 


_ ah Li/ 0) + fel JP] (6.2.4) 


Reese: for y >.0. 


Example 6.2.3 


Evaluation of equation (6.2.1) can be more complicated than the form given by 
equation (6.2.2), because a union of intervals may occur. 


A signal is sent to a two-sided rotating antenna, and the angle of the antenna at 
the time the signal is received can be assumed to be uniformly distributed from 0 
to 2x, © ~ UNIF(O, 2x). The signal can be received if Y = tan @> yo. For 
example, y > 1 corresponds to the angle 45° < © < 90° and 225° < ® < 270°. 
The CDF of Y when y < 0is 


Fy(y) = P [tan (@) < y] 
= P[n/2<O<n+ tan ' (y)] + P[3n/2<O<2n+ tan“! (y)] 
= (1/2n)[x + tan~* (y) — 2/2 + 20 + tan™! (y) — 32/2] 
= 1/2 + (1/n) tan~! (y) 

By symmetry, PLY > y] = PLY < —y]. Thus, for y >.0, 

Fyy) =1- PLY > y]=1-—P[Y<—y] 
= 1—[1/2 + (1/n) tan~* (—y)] 
= 1/2 + (1/n) tan~! (y) 
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It is interesting that 


dF 1 1 
Fgh ee eas 


This is the pdf of a Cauchy distribution (defined in Chapter 3), Y ~ CAU(1, 0). 


The CDF technique also can be extended to apply to a function of several 
variables, although the analysis generally is more complicated. 


Let X =(X,, X2,...., X,) be a k-dimensional vector of continuous random vari- 
ables, with joint pdf f(x,, x2,..., x,). If Y = u(X) is a function of X, then 


Fy(y) = Plu(X) < y] 


=| [ Foca sm dx, ... dx; (6.2.5) 
Ay 


where A, = {x| u(x) < y}. 


Of course, the limits of the integral (6.2.5) are functions of y, and the conve- 
nience of this method will depend on the complexity of the resulting limits. 


In Example 4.6.1 we considered the sum of two independent random variables, 
say Y = X, + X,, where X; ~ EXP(1). The set required in (6.2.5) is, as shown in 
Figure 6.1, 


Ay = {(x1, xX)|0<x;<Sy— x2; O< x, <y} 


Region A, such that x, +x. <y 


x2 
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and consequently 


y y~x2 
Fy{y) = | { en @1t92) dx, dx, 
0 JO 
=1l—e ’— ye? 


and 


d 
Syly) = ay Fy(y) 


= ye? y>O 


It is possible in many cases to derive the pdf directly and without first deriving 
the CDF. 


TRANSFORMATION METHODS 


Yheorem 6.3.7 


First we will consider transformations of variables in one dimension. Let u(x) be 
a real-valued function of a real variable x, If the equation y = u(x) can be solved 
uniquely, say x = w(y), then we say the transformation is one-to-one. 

It will be necessary to consider discrete and continuous cases separately, and 


. also whether the function is one-to-one. 


ONE-TO-ONE TRANSFORMATIONS 


Discrete Case Suppose that X is a discrete random variable with pdf f,(x) and 
that Y = u(X) defines a one-to-one transformation. In other words, the equation 
y = u(x) can be solved uniquely, say x = w(y). Then the pdf of Y is 


Aro) = few) yeB . (6.3.1) 


where B = {y| f(y) > 0}. 


Proof 
This follows because fy(y) = PLY = y] = P[u(X) = y] = PLX = w(y)] = Ay(wy). 
A 
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Let X ~ GEO(p), so that 
f(x) = pq 4 x= 1,'2,.3,.... 


Another frequently encountered random variable that also is called promictne is 
of the form Y = X — 1, so that u(x) = x — 1, w(y) = y + 1, and 


Sry) = fxy + UD 
=p” y=0,1,2,... 


which is nothing more than the pdf of the number of failures before the first 
success. 


Continuous Case Suppose that X is a continuous random variable with pdf f,(x), 
and assume that Y=u(X) defines a one-to-one transformation from 

= {x| fy(x) > 0} on to B = {y| f(y) > 0} with inverse transformation x = w(y). 
If the derivative (d/dy)w(y) is continuous and nonzero on B, then the pdf of Y is 


fly) = felw(y) | a w(y) 


yeB (8.3.2) 


Proot 


If y = u(x) is one-to-one, then it is either monotonic increasing or monotonic 
decreasing. If we first assume that it is increasing, then u(x) < y if and only if 
x < w(y). Thus, 


Fyly) = P[u(X) < y] = PLX < w(y)] = Fx(w(y)) 


and, consequently, 


- 
fry) = Fras x(W(y)) = =~ F 0) = w(y) 


d 
dw(y) 


= fx(w(y)) . w(y) ' 


because (d/dy)w(y) > 0 in this case. 
In the decreasing case, u(x) < y if and only if w(y) < x, and thus 


Fy(y) = Plu(X) < y] = P[X 2 w(y)] = 1 — Fx(w(y)) 


and 


fev) = — Selo) - (9) 


= fx(w(y)) 


d 
ay w(y) 


because (d/dy)w(y) < 0 in this case. 
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In this context, the derivative of w(y) is usually referred to as the Jacobian of 
the transformation, and denoted by J = (d/dy)w(y). Note also that transforming a 
continuous random variable is equivalent to the problem of making a change of 
variables in an integral. This should not be surprising, because a continuous pdf 
is simply the function that is integrated over events to obtain probabilities. 


We wish to use Theorem 6.3.2 to determine the pdf of Y = e* in Example 6.2.1. 
We obtain the inverse transformation x =w(y)=In y, and the Jacobian 
J = w'(y) = 1/y, so that 


ful) = fe (ny) | ; | 


= 22-2(2) 1<y<o 
y 


=2y? yéB=(I1, w) 
In a transformation problem, it is always important to identify the set B where 
FSy(y) > 0, which in this example is B = (1, 00), because e* > 1 when x > 0. 


The transformation Y = e*, when applied to a normally distributed random 
variable, yields a positive-valued random variable with an important special dis- 
tribution. 


A distribution that is related to the normal distribution, but for which the 
random variable assumes only positive values, is the lognormal distribution, which 
is defined by the pdf 


wW=-— 


e7 in y—H)2/202 0<y<a (8.3.3) 
yo./2n 


with parameters. —o<p<a; 0<a<o. This will be denoted by 
Y ~ LOGN(y, a”), and it is related to the normal distribution by the relationship 
Y ~ LOGN(u, a”) if and only if X =in ¥Y ~ N(y, 0°). 
In some cases, the lognormal distribution ‘is reparameterized by letting 
= In 6, which gives 
Sy) = 


e7 ln (/6)}2/202 (6.3.4) 


yo ./2n 


and in this notation @ becomes a scale parameter. 
It is clear that cumulative lognormal! probabilities can be expressed in terms of 
normal probabilities, because if Y ~ LOGN(y, o”), then 


Fy(y)= PLY <y]=P [In Y<iny] 
= P[X <lIn y] 


Zs o(B2=u = z) (6.3.5) 
o 
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Another important: special ‘distribution is obtained by a log transformation 
applied to a Pareto variable. 


If X ~ PAR(1, 1), then the pdf of Z =1n X is 


e? 
ide) =e 
and the CDF is 
e? 
F 2(z) = Le 


for all real z. If we introduce location and scale parameters é and 0, respectively, 
by the transformation y = u(z) = & + @z, then the pdf of Y = u(Z) is 


_1__ exp [y~ 9/0] 
HO= 6 11> exp (0 — QO eae 


for all real y. The distribution of Y is known as the logistic distribution, denoted 

by Y ~ LOG(@, €). This.is another example of a symmetric distribution, which 

follows by noting that 

= (e77/e?*)e~? = e 
(lt 2}. (ey 
The transformation y = € + @z provides a general approach to introducing 

location and scale parameters into a model 


fx(—2) = f7{z) 


Recall that Theorem 2.4.1 was stated without proof. A proof for the special 
case of a continuous random variable X, under the conditions of Theorem 6.3.2, 
now will be provided. 

Consider the case in which u(x) is an increasing function, so that the inverse 
transformation, x = w(y), also is increasing: 


Efu(X)] = [’ ulx) f(x) dx 


= |* wont twon & wo ay 


00 


= { - yfry) dy 


= E(Y) 


The case in which u(x) is decreasing is similar. 
A very useful special transformation is given by the following theorem. 


Theorem 6.3.3 
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Probability Integral Transformation If X is continuous with CDF F(x), then 
U = F(X) ~ UNIF(O, 1). 


Proof 


We will prove the theorem in the case where F(x) is one-to-one, so that the 
inverse, F~#(u), exists: 


Fy(u) = PLF(X) <u] 
= P[X < F"'(u)] 
= F(F~*(u)) 


=u 


Because 0 < F(x) < 1, we have F,(u) = Oifu <0 and F,(u)=1lifust. 


A more general proof is obtained if F~‘(u) is replaced by the function G(u) that 
assigns to each value u.the minimum value of x such that u.< F(x), 


G(u) = min {x|u < F(x} O<u<i (6.3.7) 
The function G(u) exists for any CDF, F(x), and it agrees with F~‘(u) if F(x) is a_ 


one-to-one function. 
The following example involves a continuous distribution with a CDF that is 


not one-to-one. 


Let X be a continuous random variable with pdf 


fo) = 1/2 if1<|x-—2|]<2 
oa 0 otherwise 


The CDF of X, whose graph is shown in Figure 6.2, is not one-to-one, because it 
assumes the value 1/2 for alli <x <3, 


A CDF that is continuous but not one-to-one 


F(x 
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The function G(u), for this example, is 


Qu if0<u<12 
G = 
W) a +1) if12<u<i 


The function G(u) has another important application, which follows from the 
next theorem. 


Let F(x) be a CDF and let G(u) be the function defined by (6.3.7). If 
U ~ UNIF(0, 1), then X = G(U) ~ F(x). 


Proof 


See Exercise 5. 


An important application of the preceding theorem is the generation of 
“pseudo” random variables from some specified distribution using a computer. In 
other words, the computer generates data that are distributed as the observations 
of a random sample from some specified distribution with CDF F(x). Specifically, 
if n “random numbers,” say u,, uz, ..., u,, are generated by a random number 
generator, these represent a “simulated” random sample of size n from UNIF(0, 1). 
It follows, then, that x,, x,,..., x, where 


x; = G(u;) i=1,2,...,n (6.3.8) 


corresponds to a simulated random sample from a distribution with CDF F(x). 
Of course, in. many. examples. the CDF is one-to-one, and. we. could. use 
x, = F~1(u). Equation (6.3.8) also can be used with discrete distributions. 


If X ~ BIN(1, 1/2), then 


0 ifx <0 
F(x)= 51/2 if0<x<1 
1 ifi<x 


and 
0 f0<u<if2 
au) = {* ifi/2<u<1 


TRANSFORMATIONS THAT ARE NOT ONE-TO-ONE 


Suppose that the function u(x) is not one-to-one over A = {x| fy(x) > 0}. 
Although this means that no unique solution to the equation y = u(x) exists, it 
usually is possible to partition A into disjoint subsets A,, A,, ... such that u(x) is 
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one-to-one over each A;. Then, for each y in the range of u(x), the equation 
y = u(x) has a unique solution x; = w,(y) over the set A;. In the discrete case, it 
follows that Theorem 6.3.1 can be extended to functions that are not one-to-one 
by replacing equation (6.3.1) with 


fly) =X few) (6.3.9) 


That is, f(y) = )) fx(x;) where the sum is over all x; such that u(x J=y. 
j 


Let f,(x) = #3 x = —2, —1, 0, 1, 2, and consider Y =|X|. Clearly, 
B = {0, 1, 2} and 


fy(0) = f,0) = 


2 10 
f() = fd- epaet ei 


F(2) = fl -2) + fx(2) = a4 
Another way to express this is 
4 
LO) = y.=0 


$1Q°-Q] ov 


An expression analogous to equation (6.3.9) for the discrete case is obtained for 


~ continuous functions that are not one-to-one by extending equation (6.3.2) to 


(6.3.10) 


rly) = y FW) | Gy WH) 


That is, for the transformation Y = u(X), the summation is again over all the 
values of j for which u(x,) = y, although the Jacobian enters into the equation for 
the continuous case. We found by the cumulative method in Example 6.2.2 that if 
Y = X?, then 


=F) - Fy =) * 


and by taking derivatives 


fly) = ay Lil y+ f(-JSW)] (6.3.11) 


Equation (6.3.11) also follows directly by the transformation method now by 
applying equation (6.3.10). 
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Suppose that X¥ ~ UNIF(—1, 1) and Y = X*. If we partition A =(—1, 1) into 
A, =(—1, 0) and A, =(0, 1), then y= x? has unique solutions x, = w,(y) 

—./y and x, = w,(y) = Jy over. these intervals. We can neglect the point 
x = 0 in this partition, because X is continuous. The pdf of Y is thus 


fr) = fd | =F] + | =e = TG 


If the limits of the function u(x) are not the same over each set A, of the 
partition, then greater care must be exercised in applying the equations. This is 
illustrated in the following example. 


ye B=(0, 1) 


With f,(x) = x?/3, —1 <x <2, and zero otherwise, consider the transformation 
Y = X?. In general, we still have the inverse transformations XxX, = W,(y) = —/y 
for x <0 and x, = w,{y) = Jy for x > 0; however, for 0 < y < 1 there are two 
points with nonzero pdf, namely. x, = = Jy and x, = SVs that map into y, 
whereas for 1 < y < 4 there is only one point with nonzero pdf, x. = Jy , that 
maps into y. Thus the pdf of Y is 


fy) = GF Lh/”) + fe(-J/] 
ae, [ae 
2/yl 3 3 


alot | l<y<4 


In the previous example, notice that it is possible to solve the problem without 
explicitly using the functional notation u(x) or wy). Thus, as suggested earlier, a 
simpler way of expressing equations (6.3.9) and (6.3.10), respectively, is 


fly) = 2X Sx) (6.3.12) 


0<y<i 


and 


(6.3.13) 


y= 2 Sx) 


where it must be kept in mind that x; = w,(y) is a function of y. 
This simpler notation will be convénient in expressing the results of joint trans- 
formations of several random variables. 


JOINT TRANSFORMATIONS 


The preceding theorems can be extended to apply to functions of several random 
variables. 
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Consider the geometric variables X, Y ~ GEO(p) of Example 4.4.3. As we found 
in this example, the joint pdf of X and T = X + Y can be expressed in terms of 
the joint pdf of X and Y, namely 


Fx, 1, ) = fx, yl, t — x) 
where x and t — x are the solutions of the joint transformation x = u,(x, y) and 


t= u(x, yy=xt+y. 
This can. be. generalized .as follows. Consider.a k-dimensional vector X 


= (X,, X2,..., X;,) of random variables, and suppose that u(x), u(x), ..., u,(x) 
are k functions of x, so that Y, = u,X) for i=, ..., k defines another vector of 
random variables, Y=(Y,, Y,,..., ¥,)..A more concise way to express this is 
Y = u(X). 


In the discrete case, we can state a k-dimensional version of Theorem 6.3.1. 


if X is a vector of discrete random variables with joint pdf f,(x) and Y = u(X) 
defines a one-to-one transformation, then the joint pdf of Y is 


Sen V2re00) Yi) nae Ix(X1, X25 See X,) (6.3.14) 
where x;,.X2,..., x, are the solutions of y = u(x), and consequently depend on y,, 
y2 prers Ve . 


If the transformation is not one-to-one, and if a partition exists, say A,, Az,..., 
such that the equation y = u(x) has a unique solution x = x, or 


Hy = (X1js Noyes. +s Xpy) (6.3.15) 


over A,, then the pdf of Y is 


Seay «+s Ve) = y SulX 155 0665 Xpy) (8.3.16) 
j 


Joint transformations of continuous random variables can be accomplished, 
although the notion of the Jacobian must be generalized. Suppose, for example, 
that u,(x,, x2) and u2(x,, x2) are functions, and x, and x, are unique solutions to 
the transformation y,; = u,(x,, x2) and y, = u2(x;, x2). Then the Jacobian of the 
transformation is the determinant 


Ox, Ox, 
J= OV. V2 (6.3.17) 
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It is desired to transform x, and x, into x, and the product x, x. Specifically, let 
yy =X, and y, =x, x2. The solution is x, = y,; and x, =-y,/y,, and the Jaco- 
bian is 

1 0 
—yalyt 1/ys 
For a transformation of k variables y = x(x), with a unique solution 


| = I/y, 


xX = (x1, X2,..., X,), the Jacobian is the determinant of the k x k matrix of partial 
derivatives: 
Oe Oh Oss 
dy, ay, OY, 
ox, | 
ja| On ; (6.3.18) 
oy, OY, 


Theorem 6.3.2 can be generalized as foliows. 


Suppose that X = (X,, X>, ..., X,) is a vector of continuous random variables 
with joint pdf f,(x,, x2, ..., x, >0on 4, and Y=(Y¥,, ¥,..., ¥) is defined by 
the one-to-one transformation 
Y, = uX,, X2,..., X;) PS), Qe vias K 
If the Jacobian is continuous and nonzero over the range of the transfor- 
mation, then the joint pdf of Y is 


Se +) Ved) = Sxl, ---, XI - (6.3.19) 


where x = (x,,..., X,) is the solution of y = u(x). 


Proof 


As noted earlier, the problem of finding the pdf of a function of a random vari- 
able is related to.a change of variables in an integral. This approach extends 
readily to transformations of k variables. Denote by B the range of a transfor- 
mation y = u(x) with inverse x = w(y). Assume D c B, and let C be the set of all 
points x = (x,,..., X,) that map into D under the transformation. We have 


P[YeD] =f [pow dy, ++: dy, 


D 
= fof flr nad dx,::-dx,. 
c 
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But this also can be written as 


[- [ Aono. ved Yio ar) WV i> aaher9 yal | J | dy, . dyx, 
D 


as the result of a standard theorem on change of variables in an integral. Because 
this is true for arbitrary D < B, equation (6.3.19) follows. a 


Let X, and X, be independent and exponential, X, ~ EXP(1). Thus, the joint pdf 
is 


Sixes, x.(%1) X2) = A a (X1, X2)€ A 


where A = {(x,, X2)|0 < x,, 0 <x}. Consider the random variables ¥,=X, 
and Y,=X,+X,. This corresponds to the transformation y, =x, and 
y2 =X, +2, which has a unique solution, x, = y, and x, = y, — y,. The Jaco- 
bian is 


and thus 
Si, PAGrE y2) = Ix, Vis 2-1) 
=e” (V1, ¥2) € B 


and zero otherwise. The set B is obtained by transforming the set A, and this 
corresponds to yy =x,>0 and y,—y,; =x, >0. Thus, B= {(y,, y,)|0<y, 
< y, < 00}, which is a triangular region in the plane with boundaries y, = Oand 


- Y2 = y;. The regions A and B are shown in Figure 6.3. 


FIGURE 6.3 


Regions corresponding to the transformation y, =x, and yz =x, +X. 


x y2 
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The marginal pdf’s of Y, and Y, are given as follows: 


ao 


frQy) = | e-”? dy, 


VA 


=e! Vi >0 


Fy 2) = { e” dy, 


0 


=y,e”? y,>0 


Note that Y, ~ GAM(I, 2). 


Example 6.3.13 Suppose that, instead of the transformation of the previous example, we consider 
a different transformation, y,; =x, —x, and y, =x, 4+. 
The solution is x, = (y,; + y2)/2 and x, =(y, — y,)/2, so. the Jacobian is 


1/2 1/2 
J = = 
—1/2 1/2 | Me 
The joint pdf is given by 


cs Tae 
Fry Ya) = 3 eas (Yi, Y2) € B 
where, in this example, B = {(y,, y2)| —y2 < y1 < Y2, 2 > 0} with boundaries 


yz = —y, and 0< y, = y,. The region A is the same as in Figure 6.3, but B is 
different, as shown in Figure 6.4. 


FIGURE 6.4 Region corresponding to the transformation y, =x, —x2 and y,=x,+x, 


y2 


Se Gea y= 


yi 
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The marginal pdf’s of Y, and Y, are 


os 1 
9° OA ae yy, <0 


fon={ 


yi 


ak: Sam Dose 
-{ 3° ded Netete is y,>0 


1 
1 —lyal 
ah —O<y,y< OO 


y2 1 _ _ 
fra) = | 7° dy, =y,e™ y2>0 
—y2 


As in the previous example, Y, ~ GAM(I, 2), and Y, has the double exponen- 
tial distribution DE(1, 0). 


It is possible to extend Theorem 6.3.6 to. transformations that are not one-to- 
one in a manner similar to equation (6.3.13). Specifically, if the equation y = u(x) 
can be solved uniquely over each set in a partition A,, A,,..., to yield solutions 
such as in equation (6.3.15), and if these solutions have nonzero continuous Jaco- 
bians, then 


Aras VW) = by SxlX 155-2 Med Fil (6.3.20) 
where J; is the Jacobian of the solution over A;. 


An important application of equation (6.3.20) will be considered in Section 6.5, 
but first we will consider methods for dealing with sums. 


SUMS OF RANDOM VARIABLES 


Special methods are provided here for dealing with the important special case of 
sums of random variables. 


CONVOLUTION FORMULA 


If one is interested only in the pdf of a sum S = X, + X,, where X, and X, are 
continuous with joint pdf f(x,, x2), then a general formula can be derived using 
the approach of Example 6.3.12, namely 


16) =|" flus—aat reer 
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If X, and X, are independent, then this usually is referred to as the convolution 
z formula, 


f= | lols 9 a (642) 


Example 6.4.1 Let X, and X, be independent and. uniform, X;~ UNIF(O, 1), and let 
S=X,+X,. The region B corresponding to the transformation t= x, and 
S=xX,+x,is B= {(s,)|0<t<s<t+1< 2}, and this is shown in Figure 6.5. 


FIGURE 6.8 Regions corresponding to the transformation t=x, and s=x,+x, 


X2 t 


Thus, from-equation (6.4.2) we have 


fio)= | at=s O0<s<il 


0 
1 

-| dt=2—-—s i<s<2 
is—1 


=1-—|s—1| 0<s<2 
“and zero otherwise. 
In some cases it may be necessary to consider points other than just the 


boundaries in determining the new, range space B. Care must be exercised in 
determining the appropriate limits of integration, depending on B. 


Example 6.4.2 Suppose that X, and X, are independent gamma variables, 


1 
I (X1; X2) = Tere xE~ 18-1 e 1 T32 0<x,< a 
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Let Y,=X,+X, and Y,=X,/(X,+X,.), with inverse transformations 
xX; =Y,y2 and x, = y,(1 — y2). We have 

y2 yi 

D293. Vi, | : 
and 

(Yi Ya)" ee 
Fis, vxVv Ya) = Tore Lyi(l — ya)? *e-" | —y4 | 
a—-1 aot Bri,at+Bp-1p,—yi 
2 (1 = yo)" yt é (6.4.3) 


Tor) 


if (vp y2)€ B= {(1, y2)|0<y1 < 0, O<y, <1} and zero otherwise. The 
boundary composed of the positive segment of the line x, = 0 maps into the 
positive segment of the line y. = 0, and the positive segment of the line x, = 0 
maps into the positive segment of the line y, = 1, because y. = x,(x; + x2) 
= x,/x, =1 and y, =x, >.0 when x, = 0. This does not completely bound B, 
but consider the positive segment x, = kx,. As k takes on values between 0 and 
co, all points in Aare included, ‘and such a line maps into the line y, 
= X4/(x, +kx,)=1/1+4), where y, = x, +kx,=(14+ 4x, goes between 0 
and oo as 0< x, < o. Thus, as k goes from 0 to oo, B is composed of all the 
parallel lines between. y, =.0..and-y. = 1. 


It is interesting to observe from equation (6.4.3) that Y, and Y, are independent 
random variables and Y, ~ GAM(1, a + f). 


Assume that X,, X,, and X3 are independent gamma variables, X; 


~ GAM(l, «,); i = 1, 2, 3. The joint pdf is 


ee | 
Fxy, x2, x3(%1 X2 X3) = Ul Ta) ™ Korte 0<x;< a 


> X, with inverse transformation 


3 
Let y= x) 5 X;,i= 1,2; and Y3 = 
j=l 


j=l 
X1 = Yi V3, X2 =Y2y3, and x3 = y3(1 — yy — yo) 
We have 


212 CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES 


and 
+42 +a3-1,-— —1,a@2~-1 a3-1 
Bees e Tyits Ye (ht ys — Ya) 


Sy, Y2, v1 Y2> y3) = Te, 0 (a,)0 (a3) 


(Vis Yo, y3) € B 
where 
B= {(V1; Yas Y3)10< yy < 0, O<y2< 00, Yi +2 <1, O<y3 < 0} 


We see in this case that Y, is again a gamma variable and is independent of Y, 
and Y,, although Y, and Y, are not independent of each other. The joint density 
of Y, and Y, is known as a Dirichlet distribution. A similar pattern will hold if this 
transformation is extended to k variables. In particular, if X;~ GAM(I, «,), 
i= 1,...,k, and are independent, then 


k k 
Y, = S'X, ad cam( Yai) 
i=1 i 


i=1 


Sums of independent random variables often arise in practice. A technique 
based on moment generating functions usually is much more convenient than 
using transformations for determining the distribution of sums of independent 
random variables; this approach will be discussed next. 


MOMENT GENERATING FUNCTION METHOD 


Theorem 6.4.1 If X,, ..., X, are independent random variables with MGFs M,/(t), then the 
MGF of Y= ¥) X;is 


f=1 


M,(t) = My, (0 °°: Mx) (6.4.4) 


Proof 
Notice that e% = ef(¥1*"""+%") = @ft ... e!%" so by property (5.2.6), 
My(t) = Ele”) 
= E(e™ rear e!Xn) 
= Ee) «++ E(e**) 
= Mx,(t) «+» Mx,(t) 


This has a special form when X,, ..., X, represents a random sample from a 
population with common pdf f(x) and MGF M(t), namely 


My(t) = [M(a)]}" (6.4.5) 
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As noted in Chapter 2, the MGF of a random variable uniquely determines its 
distribution. 

It is clear that the MGF can be used as a technique for determining the dis- 
tribution of a function of a random variable, and it is undoubtedly more impor- 
tant for this purpose than for. computing moments. The MGF approach is 
particularly useful for determining the distribution of a sum of independent 
random variables, and it often will be much more convenient than trying to carry 
out a joint transformation. If the MGF of a-variable is ascertained, then it is 
necessary to recognize what distribution has that MGF. The MGFs of many of 
the most common distributions have been included in Appendix B. 


Let X,, .:., X, ‘be independent binomial random variables with respective 


k 
parameters n, and p, X; ~ BIN(n;, p), and let Y = }° X;. It follows that 
i=1 


My) = Mz,() «> Mx,(0) 
= (pel + gy" - + (pe! + q)™ 
_ (pe + gy htm 


We recognize that this is the binomial MGF with parameters n, + --- +n, and 
p, and thus, Y ~ BIN(n, + -+*+ + n,, p). 


Let X,, ..., X, be independent: Poisson-distributed: random variables, 
Xx; iad POI), and let Y ao xX, a XxX, . The MGF of XxX; is M,{t) 


— = exp [u,e' — 1)], and consequently the MGF of Y is 


My(t) = exp [4,(e — 1)] +: exp [z,(e" — 1)] 
=exp [H+ --° +4,)(e — 1] 
which shows that Y ~ POI(u, + --- + 4,). 


Suppose that X,, ..., X, are independent gamma-distributed random variables 
with respective shape parameters K,, K2,..., «, and common scale parameter 6, 
X, ~ GAM(6, «,) fori = 1, ..., n. The MGF of X;, is 


M;{t) =(1—0n-*—t < 1/0 
If Y= 5 X,, then the MGF of Y is 
=1 


My(t) = (1 — 020" + (1 — Ot) 
= (i a Ot) ite ten) 


(Ena 
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and consequently, Y ~ GAM(0, x,.+.°°:.+x,). Of course, this is consistent with 
some earlier examples. 


Let X,, ..., X, be independent normally distributed random variables, 


X; ~ N(u;, 97), and let Y = }\ X,. The MGF of X; is 


i=1 
M(t) = exp (u;t + o?t?/2) 
and thus the MGF of Y is 
My{t) = exp (u, t + of t?/2) «+: exp (u,t + o2t?/2) 
= exp (ut + oft7/2+-::+y,t + o7t?/2) 
= exp (G4 fee +4,)t + (0? peep o?)t?/2] 


which shows that ¥ ~ N(uw, +--+: + u,,97 + °°: +7). 
This includes the special case of a random sample X,, ..., X, from a normally 
distributed population, say X; ~ N(u, o*). In this case, w= 4, and o? = o? for all 


i=1,..., n, and consequently }\ X;~ N(ny, no). It also follows readily in this 
i=1 


case that the sample mean, X = }' X,/n is normally distributed, X ~ N(u, c/n). 
i=1 


i 


An important application of the transformation method that involves 
ordered random variables is discussed next. 


ORDER STATISTICS 


The concept of a random sample of size n was discussed earlier, and the joint 
density function of the associated n independent random variables, say 
X4,.:.,X,, 18 given by 


I Ges Oi) Se,) (6.5.1) 


For example, if a random sample of five light bulbs is tested, the observed 
failure times might be (in months) (x,, ..., x5) = (5, 11, 4, 100, 17). Now, the 
actual observations would have taken place in the order x; = 4, x, = 5, x, = 11, 
Xs = 17, and x4 = 100. It often is useful to consider the “ordered” random sample 
of size n, denoted by (x1.,) X2:n> ++» Xnn)- That is, in this example x,,, = x, = 4, 
Xai5 =X. =5, X3,5 = X2 = 11, X4,5 = x5 = 17, and x5,,; = x, = 100. Because we 
do not really care which bulbs happened to be labeled number 1, number 2, and 
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so on, one could equivalently record the ordered data as it was taken without 
keeping track on the initial labeling. In some cases one may desire to stop after 
the r smallest ordered observations out of n have. been observed, because this 
could result in a great saving of time. In the example, 100 months were required 
before all five light bulbs failed, but the first four failed in 17 months. 

The joint distribution of the ordered variables is not the same as the joint 
density of the unordered variables. For example, the 5! different permutations of 
a sample of five observations would correspond to just one ordered result. This 
suggests the result of the following theorem. We will consider a transformation 
that orders the values x,, x2, ..., x, . For example, 


Vy, = Uy(Xy, X2,..., X,) = min (x1, X2,..., X,) 
Vn FF Up(X1, X25 0525 Xp) SH MAX (045% g 5000 Xp) 


and in general y; = u{x,, x2, ..., X,) represents the ith smallest of x,, x.,..., Nis 
For an example of this transformation, see the above light-bulb data. Sometimes 
we will use the notation x,,,, for u(x,, x2, ..., x,), but ordinarily we will use the 
simpler notation y,. Similarly, when this transformation is applied to a random 
sample X,, X2,...., X, we will obtain a set of ordered random variables, called 
the order statistics and denoted by either X,.,,.Xo.n5-+-sXnin OF Yi, Yay.-.5 Y- 


if X,, X,,..., X, is a random sample from a population with continuous pdf 
J (x), then the joint pdf of the order statistics Y,, Y,,..., ¥, is 


HV1 V2 -+-9 Vw) = OS VS (V2) FO») (6.5.2) 
ify; <y2<°:: < y,, and zero otherwise. 


This is an example of a transformation of continuous.random variables that is 


~ not one-to-one, and it may be carried out by partitioning the domain into subsets 


A,, A,, ... such that the transformation is one-to-one on each subset, and then 
summing as suggested by equation (6.3.20). 

Rather than attempting a general proof of the theorem, we will illustrate it for 
the case n = 3. In this case, the sample space can be partitioned into the following 
3! = 6 disjoint sets: 


A, = {(xX4, Xz, X3)|x, <x. < x3} 
Az = {(X1, X2, x3) [x2 < xy < x3} 
Ay = {(X4, X2, X3) |x, < x5 <'x,} 
Ag = {(X1, X2, X3)|%2 < x3 < x4} 
Ags = {(X1, X25 X3)|x3 < xy < x2} 
Ag = {(%1, X25 X3)|X3 < x2 < x4} 


and the range of the transformation is B = {(y,, y2, ¥3)|¥1 < Ya <Ys3}- 
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In transforming to the ordered random sample, we have the one-to-one trans- 
formation 


Y, = X,,¥%,=X,,Y,=X; with J, = 1 on A, 
Y= X2; Y=X,, ¥Y, =X; with J, = —i on A, 
y,2¥,°y, 2 xo ys ¥, with J, = “Yon A, 


and so forth. Notice that in each case |J;| = 1. Furthermore, for each region, the 
joint pdf is the product of factors f(y,) multiplied in some order, but it can be 
written as f(y,)f(y2)f(y3) regardless of the order. If we sum over all 3! = 6 
subsets, then the joint pdf of Y,, Y,, and Y, is 


V1, Ya» Ys) = x SyDIO)FW3) 
=3fV)S0Df03) 1 <2 <3 


and zero otherwise. The argument for a sample of size n, as given by equation 
(6.5.2), is similar. 


Suppose that X,, X,, and X, represent a random sample of size 3 from a popu- 
lation with pdf , 


f()=2x O<x<l 
It follows that the joint pdf of the order statistics Y,, Y,, and Y, is 
G1, Ya» 3) = 31(2ys)(2y2)(2ys) 


=48yi:yr2¥3 O<y1<y2<y3 <1 


and zero otherwise. 

Quite often one may be interested in the marginal density of a single order 
Statistic, say ¥,, and this density can be obtained in the usual fashion by inte- 
grating over the other variables. In this example, let us find the marginal pdf of 
the smallest order statistic, Y,: 


1 fl 
9101) = [ [ 48y1 V2 ¥3 dy3 dy2 
yi y2 


= 6y,(1 — yi)? O<y,<1 


If we want to know the probability that the smallest observation is below some 
value, say 0.1, it follows that 


O.1 
PLY, < 0.1] = [ g(v1) dy, = 0.030 
0 A 


It is possible to derive an explicit general formula for the distribution of the kth 
order statistic in terms of the pdf, f(x), and CDF, F(x), of the population random 
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variable X. If X is a continuous random variable with f(x)>0 ona<x<b 
(a may be — co and b may be oo), then, for example, for n = 3, 


, b fb 
9:01) = | [ BfODS (2) Fs) dy3 dy2 


b 
= 3!f(y1) [ I (v2)LF(6) — Fly2)] dy. 


{i — F{y,) lly 
2 vi 


=3f(yJIL-—Foy)l? a<y,<b 


= —31f (yy) 


Similarly, 


bo. fy2 
92¥2) = | [ 31fODS02)F(3) 4y1 aya 


b 
= 31f()LFO2) — F@)] | f(ys) dy3 
= 31 fv) — Fy.) F 02) 4a<y2<5b 


where F(a) = 0 and F(b) = 1. 


These results may be generalized to the n-dimensional case to obtain the fol- 
lowing theorem. 


Theorem 6.5.2 Suppose that X,,..., X, denotes a random sample of size n from a continuous 
' pdf, f(x), where f(x) > 0 for a < x < b. Then the pdf of the kth order statistic ¥, is 
given by 


HOY = Hig oi Fowl Lt - FowT*vo0 (6.5.3) 


if a < y, < b, and zero otherwise. 


An interesting heuristic argument can be given, based on the notion that the 
“likelihood” of an observation is assigned by the pdf. To have ¥, = y,, one must 
have k — 1 observations less than y,, one at y,, and n — k observations greater 
than y,, where PLX < y,] = F(y,), P[X > y,] = 1— F(y,), and the likelihood ‘of 
an observation at y, is f(y,). There are n!(k — 1)!1!(n — k)! possible orderings of 
the n independent observations, and g,(y,) is given by the multinomial expression 
(6.5.3). This is illustrated in Figure 6.6. 

A similar argument can be used to easily give the joint pdf of any set of order 
statistics. For example, consider a pair of order statistics Y, and Y, where i <j. To 
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FIGURE 6.6 


FIGURE 6.7 
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The kth ordered observation 


ae 1 n-k 
nn er eee 
—— + 17+ +—-—_ +4} » 
Yi Yn eee Vy y Yk Vezy cco Vp, 


have ¥;= y; and Y,= y,, one must have i — 1 observations less than yj, one at 
y;,j —i— 1 between y; and y,, one at y;, and n —j greater than y;. Applying the 
multinomial form gives the joint pdf for ¥, and Y, as 


n!} es 
I:AVis ¥) = G@—DiG-i-Dla—pi [F(y)l Sy) 
x FY) — FO) 01 — FOYT" 40) (6.5.4) 


ifa < y; < y; < b, and zero otherwise. This is illustrated by Figure 6.7. 

The smallest and largest order statistics are of special importance, as are 
certain functions of order statistics known as the sample median and range. If n is 
odd, then the sample median is the middle observation, ¥, where k = (n + 1)/2; if 
n is even, then it is considered to be any value between the two middle observa- 
tions ¥, and ¥,,, where k = n/2, although it is often taken to be their average. 
The sample range is the difference of the smallest from the largest, R = Y, — Y,. 
For continuous random variables, the pdf’s of the minimum and maximum, Y, 
and Y,, which are special cases of equation (6.5.3), are 


gi(v1) = afl — Fy)" fy) a<y,<b : (6.5.5) 
and 
InVn) = OLE)" f(y.) a<y,<b (6.5.6) 


For discrete and continuous random variables, the CDF of the minimum or 
maximum of the sample can be derived directly by following the CDF technique. 
For the minimum. 


G,(yv1) = PLY, <y,] 


=1—P[Y,>y,] 
=1- Pall X,>y,] 
=1-[1.-Fyi)!" (6.5.7) 


The ‘th and /th ordered observations 


i-] 1 jvi-l I n-j 


Theorem 6.5.3 
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For the maximum 
G,(y,) = PLY, < yal 
= P[all X; < y,] 
= [Fo,)]" (6.5.8) 


Following similar arguments, it is possible to express the CDF of the kth order 
statistic. In this case we have Y, < y, if k.or more .X, are at-most y,, where the 
number of X; that are at most y, follows a binomial distribution with parameters 
nand p = F(y,). 

That is, let A; denote the event that exactly j X;’s are less than or equal to y, 
and let B denote the event that Y, < y,; then 


j=k 


where the A; are disjoint and P(A,)= ("pa — p)"~4, It follows that 
P(B) = ) P(A,), which gives the result stated in the following theorem. . 
j=k 


Jj 


For a random sample of size n from a discrete or continuous CDF, F(x), the 
marginal CDF of the kth order statistic is given by 


GAV¢) = oy ("rove —FyV (6.5.9) 


Consider the result of two rolls of the four-sided die in Example 2.1.1. The graph 
of the CDF of the maximum is shown in Figure 2.3. Although this function was 
obtained numerically from a table of the pdf, we can obtain an analytic expres- 
sion using equation (6.5.8). Specifically, let X, and X, represent a random sample 
of size 2 from the discrete uniform distribution, X, ~ DU (4). The CDF of X; is 
F(x) = [x]/4 for 1 < x <4, where [x] is the greatest integer not exceeding x. If 
Y, = max (X,, X,), then G,(y2) = ([y2]/4)” for 1 <y. <4, according to equa- 
tion (6.5.8). The CDF of the minimum, Y, = min (X,, X,), would be given by 
G,(y,;) =1—(1 — [y,]/4 for 1 < y, <4, according to equation (6.5.7). 


Consider a random sample of size n from a distribution with pdf and CDF given 
by f(x) = 2x and F(x) =x?; 0<x <1. From equations (6.5.5) and (6.5.6), we 


have that 


g1(¥1) = 2ny,(1 — y7""*  O< y, <1 
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and 
Gn),) = 2ny,(y2)"~* 


= 2ny2"-! O<y, <1 


The corresponding CDFs may be. obtained by integration or directly from 
equations (6.5.7) and (6.5.8). 


Example 6.5.4 Suppose that in Example 6.5.3 we are interested in the density of the range of the 
sample, R = Y, — Y,. From gy (6.5.4), we have 


i" 


91,10 In) = py) ry (2y Lyn — vil" 7Q2y,) O< yy < yy <1 


Making the transformation R= Y,— Y,, S= Y,, yields the inverse transfor- 
mation y; =s, y, =r +s, and|J|= 1. Thus, the joint pdf of R and S is 


(r,s) = sr + s)[r? +2rs]""?_  O<s<1-7,0<r<i 


4n! 
(n— 2)! 


The regions A and B of the transformation are shown in Figure 6.8. 


FIGURE 6.8 Regions corresponding to the transformationr=y,-y, and s=y, 


The marginal density of the range then is given by 


ivr 
hy= | h(r, s) ds 
0 
For example, for the case n = 2, we have 
1-r 
h(n= { 8s(r + s) ds 
0 


= (4/3)(r + 2)(1 — r)? (6.5.10) 


forO<r<1. 
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An interesting general expression can be obtained for the marginal CDF of R, 
because 


H,(y)= ie ie 91, (5, x + 8) ds dx 
=|" {= eo S(S[LF(&+ 5) — F(s)]"73f(x+ s) dx ds 


= ic nf(s\[F(r + s) — F(s)]"~' ds (6.5.11) 


Note, however, that great care must be taken in applying this formula to an 
example where the region with f(x) > 0 has finite limits. 


Example 6.5.5 Again consider Example 6.5.4. In that case F(s +r) = 1 if s > 1 —r, so equation 
(6.5.11) becomes 


1 


Hw= [) “neste +sy—s?}7"7} ee { n(2s)[1 — s?]"~? ds 
0 


For the case n = 2, 


1-r 1 
A,W= | 4s(r? + 27s) ds + | 4s(1 ~ s?) ds 
0 


8r su wt 
chen +3 


which is consistent with the pdf given by equation (6.5.10). 


CENSORED SAMPLING 


As mentioned earlier, in certain types of problems such as life-testing experi- 
ments, the ordered observations may occur naturally. In such casés a great savings 
in time and cost may be realized by terminating the experiment after only the 
first r ordered observations have occurred, rather than waiting for all n failures to 
occur. This usually is referred to as Type Ii censored sampling. In this case, the 
joint marginal density function of the first r order statistics may be obtained by 
integrating over the remaining variables. Censored sampling is applicable to 
many different types of problems, but for convenience the variable will be referred 
to as “time” in the following discussion. 


Theorem 6.5.4 Type Il Censored Sampling The joint marginal density function of the first r 
order statistics from a random sample of size n. from a-continuous pdf, f(x), is 
given by 


Wn os = OL = FOP TSO (6.5.12) 


if —co <y, <°+:: < y, < wo and zero otherwise. 
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In Type II censored sampling the number. of: observations, r, is fixed but the 
length of the experiment, Y,, is a random variable. If one terminates the experi- 
ment after a fixed time, tp, this procedure is referred to as Type I censored sam- 
pling. In this case the number of observations, R, is a random variable. The 
probability that a failure occurs before time ty for any given trial is p = F(t), so 
for a random sample of size.n the random variable R follows a binomial distribu- 
tion: 

R ~ BIN(n, F(t,)) (6.5.13) 

Type I censored sampling is related to the concept of truncated sampling and 
truncated distributions. Consider a random variable X with pdf f(x) and CDF 
F(x). If it is given that a random variable from this distribution has a value less 
than f), then the CDF of X given X < fp is referred to as the truncated distribu- 
tion of X, truncated on the right at f), and is given by 


ieee cen 


F(x|x <b) = PIX < to] 
-- F(x) 
= Ftp) 0<x< fp (6.5.14) 
and 
feels <t) = 7 0 <x < to 


Distributions truncated on the left are defined similarly. 

Now, consider a random sample of size n from f(x), and suppose it is given 
that r observations occur before the truncation point f); then, given R =r, the 
joint conditional density function of these values, say x;,..., x,,is given by 


Wx) 2 X17) = T] fol SD 
i=1 
ze rear iat f(x) (6.5.1) 


if all x < t, and zero otherwise. 

Equation (6.5.15) also would be the density function of a random sample of size 
r, when the parent population density function is assumed to be the truncated 
density function {(x)/F(t,). Thus, (6.5.15) may arise either when the pdf of the 
population sampled originally is in the form of a truncated density or when the 
original population density is not truncated but the observed sample values are 
restricted or truncated. This restriction could result from any of several reasons, 
including limitations of measuring devices. 

Thus, one could have a sample of size r from a truncated density or a trun- 
cated sample from a regular density. In the first case, equation (6.5.15) provides 
the usual density function for a random sample of size r, and in the second case it 
provides the conditional density function for the r observations that were given 


Theorem 6.5.8 
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to have values less than tf, . It is interesting to note that equation (6.5.15) does not 
involve the original sample size n. Indeed, truncated sampling may occur in two 
slightly different ways. Suppose that the failure time of a unit follows the density 
f(x), and that the unit is guaranteed for to years. Ifa unit fails under warranty, 
then it is returned to a certain repair center, and the failure times of these units 
are recorded until'r failure times are observed. The conditional density function 
of these r failure times then would follow equation (6.5.15), which does not 
depend on n, and the original number of units, n, placed in service may be known 
or unknown. Also note that the data would again naturally occur as ordered data 
and the original labeling of the original random units placed in service would be 
unimportant or unknown. Thus, it again would be reasonable to consider directly 
the joint density of the ordered observations given by 


r! E 
901, + 9-1) = Fear Ifo (6.5.16) 


ify, <:::<.y, < t) and zero otherwise. 

In a slightly different setup, one may place a known number of units, say n, in 
service and record failure times until time t,. Again, as mentioned, the condition- 
al density function still is given by equation (6.5.15) [or by equation (6.5.16) for 
the ordered. data], but in this case additional information is available, namely 
that n ~r items survived longer than time t,. This information is not ignored if 
the unconditional joint density of Y,, ..., ¥, is considered, and this usually is a 
preferred approach when the sample size # is known. This situation usually is 
referred to as. Type I censored sampling (on the right), rather than truncated 
sampling. 


Type I Censored Sampling If Y,,..., ¥, denote the observed values of a random 
sample of size n from f(x) that is Type I censored on the right at t), then the 


"joint pdf of Y,,..., Yp is given by 


Fvsccc, YRS ts 009 Vy) = Gal i= Feo" TLS (y;) (6.5.17) 


ify, <-°''<y,<t and r=1,2,.:.,n, and 
PLR = 0] = [1 — Flto)]” 
Proof 


This follows by factoring the joint pdf into the product of the marginal pdf of R 
with the conditional pdf of Y,,..., Yp given R = r. Specifically, 
Feige Va Vt9 9 Ye) = Vas YING 2, F(to)) 
ATIF) 
= TFtor geal [F(to)]T1 — F(t)" 


which simplifies to equation (6.5.17). 
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Note thatthe forms of equations (6.5.12) and (6.5.17) are quite similar, with t, 
replacing y,. 

As suggested earlier, we will wish to use sample data to make statistical infer- 
ences about the probability model for a given experiment. The joint density func- 
tion or “likelihood function” of the sample data is the connecting link between 
the observed data and the mathematical model, and indeed many statistical pro- 
cedures are expressed directly in terms of the likelihood function of the data. 
In the case of censored data, equations (6.5.12), (6.5.16), or (6.5.17) give the 
likelihood function or joint density function of the.available ordered data, and 
statistical or probabilistic results must.be based.on these.equations. Thus it is 
clear:that.the type of data available.and. the methods-of sampling can affect the 
likelihood function of the observed data. 


We will assume that failure times of airplane air conditioners follow an exponen- 
tial model EXP(@). We. will study properties of random variables in the next 
chapter that will help us characterize a distribution and interpret the physical 
meaning of parameters such as 9. However, for illustration purposes, suppose the 
manufacturer claims that.an exponential distribution with. 6 = 200 provides a 
good model for the failure times of such air conditioners, but the mechanics feel 
@ = 150 provides a better model. Thirteen airplanes were placed in service, and 
the first 10 air conditioner failure times were as follows (Proschan, 1963): 


23, 50, 50, 55, 74,90, 97, 102, 130, 194 


For Type ii censored sampling, the likelihood function for the exponential 
distribution is given by.equation (6.5.12).as 


; he Tie _(n—-ny, 4 pa af yi 
Ii» ++ Vr> a ET exp | Q |3 xp | Y 4 


i=1 


-gthgeel -(Enee-m) 


For the above data, r = 10, n = 13, and 


10 
T= Ly y, + (13 — 10)y,9 = 1447 
i=1 


It would be interesting to compare the likelihoods of the observed data assuming 
@ = 200 and @ = 150. The ratio of the likelihoods is 


HY 1» +++ Yros 200) _ 150 a 1447 1 os 1 
GV15 +++» Yio; 150) 200 200 150 


= 0.628 
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Thus we see that the observed data values are more likely under the assumption 
6 = 150 than when 6 = 200. Based on these data, it would be reasonable to infer 
that the exponential model with 6 = 150 provides the better model. Indeed, it is 
possible to show that the value of @ that yields the maximum value of the likeli- 
hood is the value 


Thus, if one wished to choose a value of 6 based on these data, the value 
6. = 144.7 seems reasonable. 

For illustration purposes, suppose that Type I censoring had been used and 
that the experiment had been conducted for 200 flying hours for each plane to 
obtain the preceding data. The likelihood function now is given by equation 
(6.5.17): 


SOV 6-25 Vas 9, to) = PNT exp | -(S +(n— nto) | 


For our example, r =.10, n = 13, and ty =.200. 
It is interesting that the likelihood function is maximized in this case by the 
value of 6 given by 


6= (Sy. +(n— nto) = 146.5 


As a final illustration, suppose that a large fleet of planes is placed in service 
and a repair depot decides to record the failure times that occur before 200 hours. 
However, some units in service may be taken to a different depot for repair, so it 
is unknown how many units have not failed after 200 hours. That is, the sample 
size n is unknown. Given that r ordered observations have been recorded, the 
conditional likelihood is given by equation (6.5.16): 


r! exp (-5 v6) 
AV 1, +02 p39 Colr) = @[1 — exp (—t,/6)]" 


where r = 10 and ty = 200. 

The value of @ that maximizes this joint pdf cannot be expressed in closed 
form; however, the approximate value for this case based on the given data is 
@ + 245. This value is not too close to the other values obtained, but of course 
the data were not actually obtained-under this mode of sampling. If two different 
assumptions are made about the same data, then one cannot expect to always get 
similar results (although the Type I and Type II censoring formulas are quite 
similar). 
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SUMMARY 


The main purpose of this chapter was to develop methods for deriving the dis- 
tribution of a function of one or more random variables. The CDF technique is a 
general method that involves expressing the CDF of the “new” random variable 
in terms of the distribution of the “old” random variable (or variables). When one 
k-dimensional vector of random variables (new variables) is defined as a function 
of another k-dimensional vector of random variables (old variables) by means of 
a set of equations, transformation methods make it possible to express the joint 
pdf of the new random variables in terms of the joint pdf of the old random 
variables. The continuous case also involves multiplying by a function called the 
Jacobian of the transformation. A special transformation, called the probability 
integral transformation, and its inverse are useful in applications such as com- 
puter simulation of data. 

The transformation that orders the values in a random sample from smallest to 
largest can be used to define the order statistics. A set of order statistics in which 
a specified subset is not observed is termed a censored sample. This concept is 
useful in applications such as life-testing of manufactured components, where it is 
not feasible to wait for all components to fail before analyzing the data. 


EXERCISES 


Let X be a random variable with pdf f(x) = 4x? if 0 <x <.1 and zero otherwise. Use the 
cumulative (CDF) technique to determine the pdf of each of the following random 
variables: 


(a) ¥ = X‘. 
(b) W = et. 
() Z=InX. 


(a) U =(X —0.5)?. 


Let X be a random variable that is uniformly distributed, X ~ UNIF(0, 1). Use the CDF 
technique to determine the pdf of each of the following: 


(a) Y= x14, 
(bl) W=e™*. 
(c). Z=1—e%, 


(d) Us X(1:X). 


3... The measured radius of a circle, R, has pdf f(r) = 6r(1—7),0<r< 1. 


(a) Find the distribution of the circumference. 
(b) Find the distribution of the area of the circle. 
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If X is Weibull distributed, X ~ WEI(O, A), find both the CDF and pdf of each of the 
following: 

(a). Y = (X/0).- 

(b) W=I1n X. 

(c) Z=(n xX). 


Prove Theorem 6.3.4, assuming that the CDF F(x) is a one-to-one function. 


6. . Let X have the pdf given in Exercise 1. Find the transformation y = u(x) such that 


70. 


77. 
72. 
713. 
14. 


75. 


16. 


17. 


78. 


Y =1(X) ~ UNIF(, 1). 


Let X ~ UNIF(O, 1). Find transformations y = G,(u) and.w = G,(u) such that 
(a) Y = G,(U) ~ EXP(1). 
(b) W = G,(U) ~ BIN(3, 1/2). 


Rework Exercise | using transformation methods. 
Rework Exercise 2 using transformation methods. 
Suppose X has pdf f(x) = (1/2) exp (—|x])for.all -w <x < 0. 
(a): Find the pdf.of Y= |X]. 
(b) Let W=O0if X <0 and. W=1 if X > 0. Find the CDF of W. 
If X ~ BIN(®, p), then find the pdf of Y=n—X. 
If X ~ NB(r, p), then find the pdf of Y = X —r. 
Let X have pdf f(x) = x?/24; —2 <x < 4 and zero otherwise. Find the pdf of Y = X?. 


Let X and Y have joint pdf f(x, y) = 4e~26*; 0 < x < 0,0 < y < 0, and zero 
otherwise. 

(a) Find the CDF of W.=X +-Y. 

(b) Find the joint pdf of U = X/Y and V = X. 

(c) Find the marginal pdf of U. 


if X,; and X, denote a random sample of size 2 from a Poisson distribution, X; ~ POI(A), 
find the pdf of Y= X,+ X,. 


Let X, and X, denote a random sample of size 2 from a distribution with pdf f(x) = 1/x?; 
1 <x < and zero otherwise. 

(a) Find the joint pdf of U = X,X,andV = X,. 

(b) Find the marginal pdf of U. 


Suppose that X, and X, denote a random sample of size 2 from a gamma distribution, 
X,~ GAM(2, 1/2). 

(a) Find the pdf of ¥Y = ./X, + X,. 

(b) Find the pdf of W = X,/X,. 
Let X and Y have joint pdf f(x, y) = e-’; 0 < x < y < o and zero otherwise. 

(a) Find the joint pdf of S=xX + YandT=X. | 
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23... 


24, 


26, 


26. 
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(b). Find the marginal pdf of T. 

(c) Find the marginal pdf of S. 
Suppose that X,, X,,..., X, are independent random variables and let ¥, = u,{X,) for 
i=1,2,...,k. Show that Y,, ¥,,..., ¥, are independent. Consider only the case where X t 
is continuous and y; = u{x,) is one-to-one. Hint: If x, = w,y,) is the inverse 
transformation, then the Jacobian has the form 


kd 
J= {I~ wi) 
q dy; : 
Prove Theorem 5.4.5 in the case of discrete random variables X and Y. Hint: Use the 
transformation s =.x and t = g(x)y. 


Suppose X and Y are continuous random variables with joint pdf f(x, y) = 2(x 4 y) if 
0<x < y <i and zero otherwise. 

(a) Find the joint pdf of the variables §S = X and T =.XY. 

(b) Find the marginal pdf of T. 


As in Exercise 2 of Chapter 5 (page 189), assume the weight (in ounces) of a major league 
baseball is a random variable, and recall that a carton contains 144 baseballs. Assume 
now that the weights of individual baseballs are independent and normally distributed 
with mean ».=.5.and standard deviation o = 2/5, and let T represent the total weight of 
all baseballs in the carton. Find the probability that the total weight of baseballs in a 
carton is at most 725 ounces. 


Suppose that X,, X,,..., X, are independent random variables, and let Y = X, + X, 
tetot X,. IfX; ~ GEO(p), then find the MGF of Y: What is the distribution of Y? 


Let X,, X2,..., X19 be. a random sample of size n = .10 from an exponential distribution 
with mean 2, X; ~ EXP(2). 


10 
(a) Find the MGF of the sum Y= 9° X,. 


f=1 


(b) What is the pdf of Y? 


Let X,,X2,X3, and X, be independent random variables. Assume that X,, X3, and X, 
each are Poisson distributed with mean 5, and suppose that Y= X, + X,+X,+X, 
~ POI(25). 

(a) What is the distribution of X,? 

(b) What is the distribution of W = X, + X,? 
Let X, and X, be independent negative binomial random variables, X, ~ NB(r,, p) and 
X, ~ NB(r., p). = 

(a) Find the MGF of Y= X, + X,. 

(b) What is the distribution of Y? 


Recall that Y ~ LOGN(x, 0”) if In Y ~ N(u, 0”). Assume that Y, ~ LOGN(y;, o?), 
i=1,...,n are independent. Find the distribution of: 


@ TTY. 
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b) IT ¥7. 
t=1 
(c) ¥,/¥,. 


(a Find | [1] 


28. Let X, and X, be a random sample of size n = 2 from a continuous distribution with pdf 
of the form f(x) = 2x if0 <x <-l and zero otherwise. 


(a) Find the marginal pdfs of the smallest and largest order statistics, Y, and Y,. 
(b) Find the joint pdf of Y, and Y,. 
(c) Find the pdf of the sample range R = Y, — ¥j. 


29, Consider a random sample of size n froma distribution with pdf f(x) = 1/x?if1 <x <0; 
zero otherwise. 
(a) Give the joint pdf of the order statistics. 
(b) Give the pdf of the smallest order statistic, Y;. 
(c). Give the pdf of the largest order statistic, Y,. 
(d) Derive the pdf of the sample range, R = Y, — Y,, forn = 2. 
(e) Give the pdf of the sample median, Y,, assuming that n is odd so that r = (n + 1)/2. 


30. Consider a random sample of size n = 5 from a Pareto distribution, X; ~ PAR({, 2). 
(a) Give the joint pdf of the second and fourth order statistics, Y, and Y,. 
(b) Give the joint pdf of the first three order statistics, Y,, Y,, and Y;. 
(c) Give the CDF of the sample median, Y,. 


37. Consider a random sample of size n from an exponential distribution, X, ~ EXP(1). Give 
the pdf of each of the following: 
(a) The smallest order statistic, Y,. 
(b) The largest order statistic, Y,. 
(c) The sample range, R = Y, — Yj. 
(d) The first r order statistics, Yijecteg he 


32. A-system is composed of five independent components connected in series. 

(a) If the pdf of the time to failure of each component is exponential, X;~ EXP(1), then 
give the pdf of the time to failure of the system. 

(b) Repeat (a), but assume that the components are connected in parallel. 

(c) Suppose that the five-component system fails when at least three components fail. 
Give the pdf of the time to failure of the system. 

(d) Suppose that n independent components are not distributed identically, but rather 
X,~ EXP(6,). Give the pdf of the time to failure of a series system in this case. 


33. Consider a random sample of size n from a geometric distribution, X; ~ GEO(p). Give the 
CDF of each of the following: 
(a) The minimum, ¥;. 
(b) The kth smallest, ¥,. 
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(c) The maximun, Y,. 

(d) Find PLY, < 1]. 
Suppose X, and X, are continuous random variables with joint pdf f(x,, x2). Prove 
Theorem 5.2.1 assuming the transformation y, = u(x,, x2), ¥2 = Xz is one-to-one. Hint: 


First derive the marginal pdf of Y, = u(X,, X,) and show that E(Y,) = [> fy.) a1 


= | j u(X1, X2) f (X14; Xz). dx, dx2. Use a similar proof in the case of discrete random 


variables. Notice that proofs for the cases of k variables and transformations that are not 
one-to-one are similar but more complicated. 


Suppose X,, X, are independent exponentially distributed random variables, 
X;~ EXP(6), and let Y= X,— X,. 

(a) Find the MGF of Y. 

(b) What is the distribution of Y? 


Show that if X,,...., X, are independent random variables with FMGFs G,(t),..., G,(¢), 
and Y= X,+--:+X,,, respectively, then the FMGF of Y is Gy(t) = G,(t)--- G,(0). 


LIMITING 
DISTRIBUTIONS _ 


FA 


INTRODUCTION 


In Chapter 6, general methods were discussed for deriving the distribution of a 
function of n random variables, say Y, = u(X,, ..., X,). In some cases, the pdf of 
Y, is obtained easily, but there are many important cases where the derivation is 
not tractable. In many of these, it is possible to obtain useful approximate results 
that apply when 7 is large. These results are based on the notions of convergence 
in distribution and limiting distribution. 
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SEQUENCES OF RANDOM VARIABLES 


Consider a sequence of random variables Y,, Y>, .... with a corresponding 
sequence of CDFs G,(y), G,(y), ... so that for each n = 1, 2,... 


G,{y) = PLY, < y] (7.2.1) 


| Example 7.2.7 


FIGURE 7.7 


Derinition 7.2.7 
if ¥, ~ G,()) for each n = 1, 2, ..., and if for some CDF G(y), 
lim G,(y) = G(y) 


1>o 


(7.2.2) 


for all values y at which G(y) is continuous, then the sequence Y,, Y,, ... is said to 


converge in distribution to Y ~ G(y), denoted by Y¥,> Y. The distribution corre- 
sponding to the CDF G(y) is called the limiting distribution of Y,. 


Let X,,..., X, be a random sample from a uniform distribution, X, ~ UNIF(0, 1), 
and let Y, = X,,.,, the largest order statistic. From the results of Chapter 6, it 
follows that the CDF of Y, is 


G,(y) = y" O0<y<i (7.2.3) 


zero if y <0 and one if y > 1. Of course, when 0 < y < 1, y" approaches 0 as n 
approaches co, and when y <0 or y > 1, G,(y) is a sequence of constants, with 


Comparison of CDFs G,,(y) with limiting degenerate CDF G(y) 
Gy) G(y) 
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respective limits 0 or 1. Thus, lim G,(y) = G(y) where 


0 <i 
Gy) = | i : >] (7.2.4) 


This situation is illustrated in Figure 7.1, which shows G(y) and G,(y) for n = 2, 
5, and 10. 


The function defined by equation (7.2.4) is the CDF of a random variable that 
is concentrated at one value, y= 1. Such distributions occur often as limiting 
distributions. 


Definition 7.2.2 
The function G(y) is the CDF of a degenerate distribution at the value y = c if 


ou= 4} y<c 


type (7.2.5) 


In other words, G(y) is the CDF of a discrete distribution that assigns probability 
one at the value y = c and zero otherwise. 


Let X,, X,, ..., X, be a random sample from an exponential distribution, 
X, ~ EXP(6), and let ¥, = X,,,, be the smallest order statistic. It follows that the 
CDF of Y, is 


G,{y) =1—e7 Py >0 (7.2.6) 


and zero otherwise. We have lim G,{y) =1 if y>0 because e~”® <1 in this 


case. Thus, the limit is zero if y <0 and one if y > 0, which corresponds to a 
degenerate distribution at the value y =.0. Notice that the limit at y = 0 is zero, 
which means that the limiting function is not. only discontinuous at y = 0 but 
also not even continuous from the right at y= 0, which is a requirement of a 
CDF. This is not a problem, because Definition 7.2.1 requires only that the limit- 
ing function agrees with a CDF at its points of continuity. 


Definition 7.2.3 


A sequence of random variables, Y,, Y,, ..., is said to converge stochastically to a 
constant c if it has a limiting distribution that is degenerate at y = c. 
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An alternative formulation of stochastic convergence will be considered in 
Section 7.6, and a more general concept called convergence in probability will be 
discussed in Section 7.7. 

Not all limiting distributions are degenerate, as seen in the next example. The 
following limits are useful in many problems: 


: c\"? 
lim [1+7) =e? (7.2.7) 
nb § 
lim E fe 4 sa =e if lim dn) =0 (7.2.8) 


These are obtained easily from expansions involving the natural logarithm. 
For example, limit (7.2.7) follows from the expansion nb In (1+ c/n) 
= nb(c/n + ++:) = cb +--+, where the rest of the terms approach zero as n > 00. 


Suppose that X,, ..., X, is a random sample from a Pareto distribution, 
X;~ PAR(I, 1), and let ¥, = nX,,,. The CDF of X, is F(x) =1-—(1+x)7!; 
x > 0, so the CDF of Y, is 


G,(y) = 1 — (1 + x) y>0 (7.2.9) 


Using limit (7.2.7), we obtain the limit G(y) = 1 — e~”; y > 0 and zero otherwise, 
which is the CDF of an exponential distribution, EXP(i). This is illustrated in 
Figure 7.2, which shows the graphs of G(y) and G,(y) for n = 1, 2, and 5. 


Comparison of CDFs G,,(y) with limiting CDF G(y) 


Gy) Gly) Gs) 


The following example shows that a sequence of random variables need not 
have a limiting distribution. 
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For the random sample of the previous example, let us consider the largest order 
statistic, Y, = X,,,,. The CDF of Y, is 


G,(y) = (4) y>0 (7.2.10) 


and zero otherwise. Because y/{1 + y) < 1, we have lim G,(y) = G(y) = 0 for all y, 


which is not a CDF because it does not approach one as y > 00. 


In the previous example, suppose that we consider a rescaled variable, 
Y, = (1/n)X,,,, which has CDF 


1 —n 
G,(y) = (: a x) y>0 (7.2.11) 


and zero otherwise. Using limit (7.2.7), we obtain the CDF G(y) = e~1”; y > 0, 


For the random sample of Example 7.2.2, consider the modified sequence 
¥, = (1/6)X,,,, — In n. The CDF is 


G,{y) = E = (Se y>—Inn (7.2.12) 


and zero otherwise. Following from limit (7.2.7), the limiting CDF is 
G(y) = exp (-e7”); a <y<o. 


We now illustrate the accuracy when this limiting CDF is used as an approx- 
imation to G,(y) for large n. Suppose that the lifetime in months of a certain type 
of component is a random variable X ~ EXP(1), and suppose that 10 indepen- 
dent components are connected in a parallel system. The time to failure of the 
system is T = X,o.;9, and the CDF is F,(t) = (1 — e~)!°; t > 0. This CDF is 
evaluated at t = 1, 2, 5, and 7 months in the table at the top of page 236. To 
approximate these probabilities with the limiting distribution, then 


F(t) = PLT <¢] 
=P[Y,.+ni0<¢ 
= G(t —In 10) 
= exp (—e7 In 10) 


= exp (— 10e~) 
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The approximate probabilities are given in the table for comparison 


ti ot 2 e9g 7 
F,(t): 0.010 0.234 0.935 0.9909 
G(t —1In 10): 0.025 0.258 0.935 0.9909 


The approximation should improve as n increases. 


Consider the sample mean of a random sample from a normal distribution, 
N(u, 0”), Y, = X,. From the results of the previous chapter, ¥, ~ N (u, o?/n), and 


GLY) = Sed (7.2.13) 


The limiting CDF is degenerate at y = y, because lim G,(y) =0 if y < y, 1/2 if 


y = 4, and 1 if y > y, so that the sample mean converges stochastically to u. We 
will show this in a more general setting in a later section. 


Certain limiting distributions are easier to derive by using moment generating 
functions. 


THE CENTRAL LIMIT THEOREM 


Theorem 7.3.1 


In the previous examples, the exact CDF.was known for.each finite n, and the 
limiting distribution was obtained directly from this sequence. One advantage of 
limiting distributions is that it often may be. possible to determine the limiting 
distribution without knowing the exact form of the CDF for finite n. The limiting 
distribution then may provide a useful approximation when the exact probabil- 
ities are not available. One method of accomplishing this result is to make use of 
MGEF-s. The following theorem is stated without proof. 


Let ¥;, Y,... be a sequence of random variables with respective CDFs G,(y), 
G,(y),... and MGFs M,(t), M,(0), .... If M(d) is the MGF of a CDF G(y), and if 
lim M,(t) = M(t) for all ¢ in an open interval containing zero, ~h <t <h, then 


lim G,(y) = G(y) for all continuity points of G(y). 
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Let X,,...,X, be a random sample from a Bernoulli distribution, 


X;, ~ BIN(1, p), and consider ¥, = )? X;. If we let p + 0 as n— co in such a way 
i=1 
that np = y, for fixed u > 0, then 


M,{t) = (pe' + 4)” 


= E + de— OP (7.3.1) 


and from limit (7.2.7) we have 
lim M,(t) = et) (7.3.2) 


which is the MGF of the Poisson distribution with mean y. This is consistent 
with the result of Theorem 3.2.3 and ‘is somewhat easier to verify. We conclude 


that ¥, 5 Y ~ POI). 


Bernoulli Law of Large Numbers Suppose now that we keep p fixed and con- 
sider the sequence of sample proportions, W, = p, = Y,/n. By using the series 
expansion e“ = 1 + u+u7/2 +--+ with u = t/n, we obtain 


M wf) = (pe +)" 


t n 
bee eae 
_ E 4 Bt al (7.3.3) 
n n 


where d(n)/n involves the disregarded terms of the series expansion, and d(n) > 0 
as n— oo. From limit (7.2.8) we have 
lim My,(t) = e” (7.3.4) 
n~> oo 
which is the MGF of a degenerate distribution at y = p, and thus p, converges 
stochastically to p as n approaches infinity. 


Note that this example provides an approach to answering the question that 
was raised in Chapter 1 about statistical regularity. If, in a sequence of M inde- 
pendent trials of an experiment, Y,, represents the number of occurrences of an 
event A, then f, = Y,,/M is the relative frequency of occurrence of A. Because the 
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Bernoulli parameter has the value p = P(A) in this case, it follows that f, con- 
verges stochastically to P(A) as M — oo. For example, if a coin is tossed repeat- 
edly, and A = {H}, then the successive relative frequencies of A correspond to a 
sequence of random variables that will converge stochastically to p = 1/2 for an 
unbiased coin. Even though different sequences of tosses generally produce differ- 
ent observed numerical sequences of f,, in the long run they all tend to stabilize 
near 1/2. 


Now we consider the sequence.of “standardized” variables: 


pe amat J 
" /npq 


With the simplified notation o, = ./npg, we have Z, = Y,/o, — np/o,. Using the 
series expansion of the previous example, 


(7.3.5) 


Moi) = ep" + ay 
= [e7 Bien petlen + gl" 


2 in 
2 E in es 7 40)| (7.3.6) 
where d(n) —- 0 as n> 00. Thus, 
lim M; (t) = e”!? . (7.3.7) 


which is the MGF of the standard normal distribution, and so Z, ee Are N(O, 1). 
This ‘is an example of a special limiting result known as the Central Limit 
Theorem. 


Central Limit Theorem (CLT) If X,, ..., X, is a random sample from a dis- 
tribution with mean p and variance a? < oo, then the limiting distribution of 


n 

YX; — ny 
wiz 
a 


is the standard normal, Z, Ly A N(O, 1) asin > 0. 


(7.3.8) 
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Proof 


This limiting result holds for random samples from any distribution with finite 
mean and variance, but the proof will be outlined under the stronger assumption 
that the MGF of the distribution exists. The proof can be modified for the more 
general case by using a more general concept called a characteristic function, 
which we will not consider here. 

Let m(t) denote the MGF of X — y, m(t)= My_,(t), and note that m(0) = 1, 
m'(0) = E(X — p) = 0, and m"(0) = E(X — yy)? = o”. Expanding m(t) by the Taylor 
series formula about 0 gives, for € between 0 and f¢, 


mt) = m(0) +.m'(O)t + ee mee 


m’'(é)t? 
Be ae 
Her 
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ae ot ig Uk (9 Bie : oot (7.3.9) 


by adding and subtracting o7t?/2. 
Now we may write 


= > (X; — ») 
no 
and 


M;,(t) = M. sural Fes) 


[ole 


- (m"(¢) — o7)t? |" |t] 
-[1+554 2na? | ine 


Asn— oo t/,/no—0, ae and m’(é) — 07 +0, so 
M;,(t) = Ee ta, + al N (7.3.10) 


where d(n) > 0 as noo. It follows that 


lim Mz,(t) = e’”/? (7.3.11) 


nH? oO 
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or 


lim Fz (z) = ®(z) (7.3.12) 


which means that Z, 478 N(O, 1). a 


Note that the variable in limit (7.3.8) also can be related to the sample mean, 


(7.3.13) 


The major application of the CLT is to provide an approximate distribution in 
cases where the exact distribution is unknown or intractable. 


Let X,,..., X, be a random sample from a uniform distribution, X; ~ UNIF(0, 1), 
and let ¥,= >. X;. Because E(X;)= 1/2 and Var(X,) = 1/12, we have the 
i=1i 


i= 


approximation 


nm oA 
%~N(3.5) 


For example, if n = 12, then approximately 
Yi2—-6~ NO, 1) 


This approximation is so close that it often is used to simulate standard normal 
random numbers in computer applications. Of course, this requires 12 uniform 
random numbers to be generated to obtain one normal random number. 


APPROXIMATIONS FOR THE BINOMIAL DISTRIBUTION 


Examples 7.3.1 through 7.3.3 demonstrated that various limiting distributions 
apply, depending on how the sequence of binomial variables is standardized and 
also on assumptions about the behavior of p as n > 00. 

Example 7.3.1 suggests that for a binomial variable Y, ~ BIN(n, p), if n is large 
and pis small, then approximately Y¥, ~ POI(np). This was discussed in a different 
context and an illustration was given in Example 3.2.9 of Chapter 3. 

Example 7.3.3 considered a fixed value of p, and a suitably standardized 
sequence was found to have a standard normal distribution, suggesting a normal 
approximation. In particular, it suggests that for large n and fixed p, approx- 
imately Y, ~ N(np, npq). This approximation works best when p is close to 0.5, 
because the binomial distribution is symmetric when p =0.5. The accuracy 
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required in any approximation depends on the application. One guideline is to 
use the normal approximation when np > 5 and ng > 5, but again this would 
depend on the accuracy required. 


The probability that a basketball player hits a shot is p = 0.5, If he takes 20 
shots, what is the probability that he hits at least nine? The exact probability is 


PLYy9 > 91 = 1 — PLY aq <8] 
ia 
=1-) ( "0.50.50? a 0.7483 
y=0 y 
A normal approximation is 
8 — 10 
PLY) 29] =1-— PLY, <8] +1- (=) 


= 1 — O(—0.89) = 0.8133 


Because ‘the binomial distribution is discrete and the normal distribution is 
continuous, the approximation can be improved by making a continuity cerrec- 
tion. In particular, each binomial probability, b(y; n, p), has the same value as the 
area of a rectangle of height b(y; n, p) and with the interval [y — 0.5, y + 0.5] as 
its base, because the length of the base is one unit. The area of this rectangle can 
be approximated by the area under the pdf of Y ~ N(np, npq), which corre- 
sponds to fitting a normal distribution with the same mean and variance as 
¥, ~ BIN(#, p). This is illustrated for the case of n = 20, p= 0.5, and y=7 in 


20 
Figure 7.3, where the exact probability is b(7; 20, 0.5) = 7 Joss 


' = 0.0739. The approximation, which is the shaded area in the figure, is 


Continuity correction for normal approximation of a binomial probability 


5(7;20,0.5) = 0.0739 
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. 75.m 10 6.5 — 10 
| = = &(—1.12) — ®(— 1.57) = 0.0732 
(i )-o =") sae ee 


a 


The same idea can be used with other binomial probabilities, such as 


P[Yo9 > 9] =1— P[Noo < 8) 


a () 


= 1 — ®(—0.67) 


= 0.7486 


which is much closer to the exact value than without the continuity correction. 
The situation is shown in Figure 7.4. 
In general, if ¥, ~ BIN(n, p) and a < bare integers, then 


Pla< Y,<bl= o( +05 — 7) a oS) ee 


4/npq a/ npq 


Continuity corrections also are useful with other discrete distributions that can 
be approximated by the normal distribution. 


FIGURE 7.4 The normal approximation for a binomial distribution 
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Example 7.4.2 Suppose that Y,~ POI(n), where n is a positive integer. From the results 
of Chapter 6, we know that ¥, has the same distribution as a sum De i 
i=1 


where X,,...,X,, are independent, X;~ POI(1). According to the CLT, Z, 


=(¥,- n)//n ay ae N(O, 1), which suggests the approximation Y, ~ N(n, n) 
for large n. For example, n = 20, we desire to find P[10 < Y,) < 30]. The exact 


30 
value is)’ e~?°(20)/y! = 0.982, and the approximate value is 
y=10 


of — *) 7 of? = *) = (2.35) — ©(—2.35) = 0.981 


Ji Jo 


iL which is quite close to the exact value. 


Ge) 
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From the CLT it follows that when the sample mean is standardized according to 


equation (7.3.13), the corresponding sequence Z, > Z ~ N(O, i). 

It would not be unreasonable to consider the distribution of the sample mean 
X, as approximately N (y, o7/n) for large n. This is an example of a more general 
notion. 


Definition 7.5.7 


If Y,, Y,,... is a sequence of random variables and m and c are constants such that 


Y= ™ 4,7 NO, 1) (7.5.1) 


Z, =~ 
c/./n 


as n- 00, then Y, is said to have an asymptotic normal distribution with asymptotic 
mean m and asymptotic variance c?/n. . 


Example 7.6.7 Consider the random sample of Example 4.6.3, which involved n = 40 lifetimes of 
electrical parts, X; ~ EXP(100). By the CLT, X, has an asymptotic normal dis- 
tribution with mean m = 100 and variance c/n = (100)?/40 = 250. 
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ASYMPTOTIC DISTRIBUTION OF CENTRAL ORDER STATISTICS 


In Section 7.1 we showed several examples that involved extreme order statistics, 
such as the largest and smallest, with limiting distributions that were not normal. 
Under certain conditions, it is possible to show that “central” order statistics are 
asymptotically normal. 


Let X,,..., X, be a random sample from a continuous distribution with a pdf 


- f(x) that is continuous and nonzero at.the pth percentile, Xp, for O<p<1. If 


k/n — p (with k — np bounded), then the sequence of kth order statistics, X kins 1S 
asymptotically normal with mean x, and variance c?/n, where 


2 ve: p(l — p) 
ell BCR sa 


Let X,, ..., X, be a random sample from an exponential distribution, 
X;~ EXP(1), so that f(x) =e-* and F(x)=1—e7*; x>0. For odd a, let 
k =(n + 1)/2, so that ¥, = X;,,, is the sample median. If p = 0.5, then the median 
is X95 = —In (0.5) = In 2 and 


51 —.0. 0. 
2251 =0.5) 025 


“LT 054 
Thus, - X;,,,, is asymptotically normal with asymptotic mean x9), =In 2 and 
asymptotic variance c?/n =.1/n. 


Suppose that X,, ..., X, is a random sample from a uniform distribution, 
xX, ~ UNIF(0, 1), so that f(x) = 1 and F(x) =x;0 <x <1. Also assume that n is 
odd and k =(n + 1)/2, so that ¥, = X,,, is the middle order statistic or sample 
median. Formula (6.5.3) gives the pdf of ¥,, which has a special form because 
k—1=n—k =(n — 1)/2 in this example. The pdf is 


9(Y) = CEST Aa [yi — y)]"~ 0<y<il (7.5.3) 


According to the theorem, with p = 0.5, the pth percentile is x), = 0.5 and 


c? = 0.5(1 — 0.5)/[1]? = 0.25, so that Z, = JMWX yn — 0.5/0.5 > Z ~ N(O, 1). 
Actually, this is strongly suggested by the pdf (7.5.3) after the transformation 
z = ./n(y — 0.5)/0.5, which has inverse transformation y = 0.5 + 0.5z/,/n and 
Jacobian J = 1/,/n. The resulting pdf is 


n\(0.5)"~} ( a 
= (1-2 B 
Sil2) Tallin Wat ; Iz) </n (7.5.4) 
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It follows from limit (7.2.7), and the fact that (1. —.z7/n)7 1/2 1, that 


z? (n-1)/2 
lim {1 —— wy ek 
no n 


and it is also possible to show that the constant in (7.5.4) approaches 1/,/27 as 
n> 00. 

Thus, in the example, the sequence of pdf’s corresponding to Z, converges to a 
standard normal pdf. It is not obvious that this will imply that the CDFs also 


|____...__=- converge, but this can be proved. However, we will not pursue this point, 


7.6 


PROPERTIES OF STOCHASTIC CONVERGENCE 


Theorem 7.6.7 


Example 7.6.7 


We encountered several examples in which a sequence of random variables con- 
verged stochastically to a constant. For instance, in Example 7.3.2 we discovered 
that the sample proportion converges. stochastically to. the population propor- 
tion. Clearly, this is a useful general concept for evaluating estimators of 
unknown population parameters, and it would be reasonable to require that a 
good estimator should have the property that it converges stochastically to the 
parameter value as the sample size approaches infinity. 

The following theorem, stated without proof, provides an alternate criterion for 
showing stochastic convergence. 


The sequence Y;, ¥,, ... converges stochastically to c if and only if for every 
e>0, 


lim P[| ¥,-—c|<e]=1 (7.6.1) 


n> oO 


A sequence of random variables that satisfies Theorem (7.6.1) is also said to 


converge in probability to the constant c, denoted by Y, .c. The notion of con- 
vergence in probability will be discussed in.a more general context in the next 
section. 


Example 7.3.2 verified the so-called Bernoulli Law of Large Numbers with the 
MGF approach. It also can be verified with the previous theorem and the Cheby- 
chev inequality. Specifically, the mean and variance of p, are E(p,) =p and 
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Var(p,) = pq/n, so.that 


P\P, — pi<e] 21-4 (7.6.2) 
é*n 
for any e > 0, so lim P[|p, — p| < e] = 1. 
This same approach can be used to prove a more general result, usually 
referred to as the Law of Large Numbers (LLN). 


If X,, ..., X, is a random sample from a distribution with finite mean and 
variance o”, then the sequence of sample means converges in probability to. p, 
X, > pu. 
Proof 
This follows from the fact that E(X,) = uw, Var(X,) = 0?/n, and thus 
o? 

PEUX, —u| <e] 21—- ar (7.6.3) 

so that lim P[|X, — “| <e] =1. 


aio 


These results further illustrate that the sample mean provides a good estimate 
of the population mean, in the sense that the probability pPeroaehes 1 that X,, is 
arbitrarily close to wasn — 0. 

Actually, the right side of inequality (7.6.3) provides additional information. 
Namely, for any e > 0 and 0< 6 <1, ifn > o7/(s*8), then 


PlIu—e<X,<put+el ei-6 


The following theorem, which is stated without proof, asserts that a sequence 
of asymptotically normal variables converges in probability to the asymptotic 
mean. 


If Z, = /n(¥, — m/c Z ~ N(O, 1), then ¥, 5m. 


We found in Examples 7.5.2 and 7.5.3 that the sample median X kn iS asymp- 
totically normal with asymptotic mean x,.,, the distribution median. It follows 


from the theorem that X,,,, 5, Xo.5 aS n— 00, with k/n > 0.5. 
Similarly, under the conditions of Theorem 7.5.1, it follows that if k/n — p, then 
the kth smallest order statistic converges stochastically to the pth percentile, 


P 
Xin a Xp- 


Jey ee ee ie ed 


Like 
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ADDITIONAL LIMIT THEOREMS 


Theorem 7.7.1 


Theorem 7.7.2 


Definition 7.7.7 


Convergence in Probability. The sequence of random variables Y, is said to con- 


verge in probability to Y, written Y, 4Y, if 


lim P[|¥,-Y¥|<e]=1 (7.7.1) 


n7>o 


It follows from equation (7.5.2) that stochastic convergence is equivalent to 
convergence in probability to the constant c, and for the most part we will 
restrict attention to this special case. Note that convergence in probability is a 
stronger property than convergence in distribution. This should not be sur- 
prising, because convergence in distribution: does not impose any requirement on 
the joint distribution of ¥, and Y, whereas convergence in probability does. The 
following theorem is stated without proof. 


For a sequence of random variables, if 


¥,>Y 
then 
Y,5Y 
For the special case Y = c, the limiting distribution is the degenerate distribu- 


tion P[Y =c] = 1. This was the condition we initially used to define stochastic 
convergence. 


if ¥, +, c, then for any function g(y) that is continuous at c, 


G¥,) > g(c) 


Proof 


Because g(y) is continuous at c, it follows that for every e > 0 a 6 > O exists such 
that | y — c| < 6 implies | g(y) — g(c)| < ¢. This, in turn, implies that 


Pll g(%,) — 9(c)| < €] > PLY, -¢| < 6] 
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because P(B) > P(A) whenever A < B. But because Y, *. c, it follows for every 
6 > O that 


lim PL] g(¥,) — g(c)| < €] > lim P[| ¥, —c| <6] =1 
The left-hand limit cannot exceed 1, so it must equal 1, and g(Y,) 4 g(c). 2] 


Theorem 7.7.2 is also valid if Y, and c are k-dimensional vectors. Thus this 
theorem is very useful, and examples of the types of results that follow are listed 
in the next theorem. 


if X, and Y, are two sequences of random variables such that Mees c and 


Y, 5d, then: 


1. aX, + bY, Sac + bd. 

2X, Y, 3 cd: 

3. X,/e51, fore #0.- 

4. 1/X,% 1/e if PLX, #0] =1 for all n,c #0. ~ 
5. /K,> Jc if PLX, > 0] =1 for all n. 


ee : 


Suppose that Y ~ BIN(n, p). We know that p = Y/n 4 p. Thus it follows that 
B(1 — p) > p(t — p). 


The following theorem is helpful in determining limits in distribution. 


Theorem 7.7.4 Slutsky’s Theorem If X, and Y, are two sequences of random variables such 


that X,>c and Y,-5 Y, then: 
ae oe: ee eee 
2. X,Y, cy. 


3. ¥/X,5Y/c; c#0. 


Example 7.7.2 


Theorem 7.7.8 


Theorem 7.7.6. 
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Note that as a special case X,, could be an ordinary numerical sequence auch 
as X, = n(n — 1). Fe 


Consider a random sample of size n from a Bernoulli distribution, 
X; ~ BIN(1, p). We know that 


PEE SZ NO 1) 


/ P(l — p)/n 
We also know that p(1 — p) > p(1 — p), so dividing by [a(1 — p)/p(1 — p)]* gives 


E =? —+Z~N(0, 1) ~ (7.7.2) 
BL — p)/n 


Theorem 7.7.2 also may be generalized. 


ania 
if ¥, 3 Y, then for any continuous function g(y), 9(Y,) Ss g(Y). 
Note that g(y) is assumed noi to depend on.n. 


If Jnl ¥, _ myc Z~N(O, 1), and if g(y) has a nonzero derivative at y=, 
g(m) # 0, then 


nlg(¥,) — g(m)] 


: 4Z~N(0,1 
jgGm| ONG. Y 


Proof 
Define u(y) = [g(y) — g(n)]/(y — m) — gm) if y # m, and let u(m) = 0. It follows. 
that u(y) is continuous at m with u(m) = 0, and thus g'(m) + u(Y,) Ls g'(m). Further- 


more, 
nfg(¥,) — glm)] _ E n(¥, — 7] [g'(m) + u(¥,)] 
[cg'(m)] c g'(m) 


From Theorem 7.7.3, we have [g'(m) + u(Y,)]/g'(m) > 1, and the result follows 
from Theorem 7.7.4. iS 
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According to our earlier interpretation of an asymptotic normal distribution, 
we conclude that for large n, if ¥, ~ N(m, c/n), then approximately 


aig ~N {otm, “Lae 773) 


Note the. similarities between this result and the approximate mean and 
variance formulas given in Section 2.4. 
(es 


The Central Limit Theorem says that the sample.mean is asymptotically nor- 
mally distributed, 


silks = #4, 7 ~ 1 1) 


or, approximately for large n, 


7 o? 
Xx, ~ Nu =) 
nt 


We now know from Theorem 7.7.6 that differentiable functions of X,, also will be 
asymptotically normally distributed. For example, if g(X,) = X?, then g'(u) = 2y, 
and approximately, 


a3 np, 2 


ASYMPTOTIC DISTRIBUTIONS OF EXTREME 
ORDER STATISTICS 


As noted in Section 7.5, the central order statistics, X,.,, are asymptotically nor- 
mally distributed as n> co and k/n- p. If extreme order statistics such as X;,,, 
X,.,, and X,,,, are standardized so that they have a nondegenerate limiting dis- 
tribution, this limiting distribution will not be normal. Examples of such limiting 
distributions were given earlier. It can be shown that the nondegenerate limiting 
distribution of an extreme variable must belong to one of three possible types of 
distributions. Thus, these three types of distributions are useful when studying 
extremes, analogous to the way the normal distribution is useful when studying 
means through the Central Limit Theorem. 

For example, in studying floods, the variable of interest may be the maximum 
flood stage during the year. This variable may behave approximately like the 


* Advanced (or optional) topic 


Theorem 7.8.1 


Theorem 7.8.2 


. fora, > 0, ifand only if ,/a, > 1 and (8, — b,)/a, > 0 asn— co. 
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maximum of a large number of independent flood levels attained through the 
year. Thus, one of the three limiting types may provide a good model for this 
vafiable. Similarly, the strength of a chajn is equal to that of its weakest link, or 
the strength of a ceramic may be the/strength at its weakest flaw, where the 
number of flaws, n, may be quite large. Also, the lifetime of a system of indepen- 
dent and identically distributed components connected in series is equal to the 
minimum lifetime of the components. Again, one of the limiting distributions may 
provide a good approximation for the lifetime of the system, even though the 
distribution of the lifetimes of the individual components may not be known. 
Similarly, the lifetime of a system of components connected in parallel is equal to 
the maximum lifetime of the components. 

The following theorems, which are stated without proof, are useful in studying 
the asymptotic behavior of extreme order statistics. 


If the limit of a sequence of CDFs is a continuous CDF, F(y) = lim F,(y), then 


for any a, > 0 and 5,, 
lim F,{a, y + b,) = Flay + b) (7.8.1) 


if and only if lim a4,=a>.0 and lim b, =b. 
nwo aya 


If the limit of a sequence of CDFs is a continuous CDF, and if 
lim F,{a, y + b,) = G(y) for all a, > 0 and all real y, then lim F,(a, y + 8,) = G(y) 


LIMITING DISTRIBUTIONS OF MAXIMUMS 


Let X1.,, +++; Xq:n. denote an ordered random sample of size n from a distribution 
with CDF F(x). In the context of extreme-value theory, the maximum X,,., is said 
to have a (nondegenerate) limiting distribution G(y) if there exist sequences of 
standardizing constants {a,} and {b,} with a,>0 such that the standardized 
variable, Y, = (X,., — ,)/a,,, converges in distribution to G(y), 


in —O 
¥, == $ ¥ ~ Gy) (78.2) 


n 
That is, if we say that X,,,, has a limiting distribution of type G, we will mean that 
the limiting distribution of the standardized variable Y, is a nondegenerate dis- 
tribution G(y). As suggested by Theorems 7.8.1 and 7.8.2, if G(y) is continuous, 
the sequence of standardizing constants will not be unique; however, it is not 
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possible to obtain a limiting distribution of a different type by changing the stan- 
dardizing constants. 
Recali that the exact distribution of X,,,,, is given by 


F,,.,(x) = [FQ)]" (7.8.3) 
If we consider Y, = (X,,., —0,)/a,, then the exact distribution of Y, is 


G,{y) = PLY, S y] = Filan ¥ + b,) 


= [F(@,y + 5,)]" (7.8.4) 
Thus, the limiting distribution of X,,.,, (or more correctly Y,) is given by 
Gly) = lim G,(y) = lim [F(a, y + b,)]” (7.8.5) 


Thus, equation (7.8.5) provides a direct approach for determining a limiting 
extreme-value distribution, if sequences {a,} and {b,} can be found that result in 
a nondegenerate limit. 

Recall from Example 7.2.6 that if X ~ EXP(1), then we may let a, = 1 and 

b, = In n. Thus, 


G,(y) = LF(y + In n)]" = E _ (>| (7.8.6) 
and thus, 
Gy) = lim E ~ (te = exp (~e™ (28.2) 


The three possible.types of limiting distributions are provided in the following 
theorem, which is stated without proof. 


Theorem 7.8.3 If Y, = (X,,., — 6,)/a, has a limiting distribution G(y), then G(y) must be one of the 
following three types of extreme-value distributions: 


1. Type I (for maximums) (Exponential type) 
Gy) =exp(—e”)  —-o<y<a (7.8.8) 


2. Type If (for maximums) (Cauchy type) 


Gy) =exp(—y"”) y>O0, y>0 (7.8.9) 
3. Type III (for maximums) (Limited type) 
ay.) — JexP [-(-y)"] y<0,y>0 
Gy) = i‘ y>0 (7810) 


The limiting distribution of the maximum from densities such as the normal, 
lognormal, logistic, and gamma distributions is a Type I extreme-value distribu- 
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tion. Generally speaking, such densities have tails no thicker than the exponential 
distribution. This class includes a large number of the most common distribu- 
tions, and the Type I extreme-value distribution (for maximum) should provide a 
useful model for many types of variables related to maximums. Of course, a loca- 
tion parameter and a scale parameter would need to be introduced into the 
model when applied directly to the nonstandardized variable. 

The Type If limiting distribution results for maximums from densities with 
thicker tails, such as the Cauchy distribution. The Type III case may arise from 
densities with finite upper limits on the range of the variables. 

The following theorem provides an alternative form to equation (7.8.5), which 
is sometimes more convenient for carrying out the limit. 


Theorem 7.8.4 Gnedenko In determining the limiting distribution of Y, = (X,,.,, — 6,)/a,, 


lim G,(y) = lim [F(a,y + b,)]" = Gy) (7.8.11) 
if and only if 
lim n[i — F(a, y + 5,)] = —In Gy) (7.8.12) 


In many cases the: greatest. difficulty involves: determining suitable stan- 

dardizing sequences so that a nondegenerate limiting distribution will result. For 

a given CDF, F(x), it is possible to use Theorem 7.8.4 to solve for a, and b, in 

terms of F(x) for each of the three possible types of limiting distributions. Thus, if 

the limiting type for F(x) is known, then a, and b, can be computed. If the type is 

not known, then a, and b, can be computed for each type and then applied to see 

' which type works out. One property of a CDF that is useful in expressing the 
standardizing constants is its “characteristic largest value.” 


Definition 7.8.7 


The characteristic largest value, u,, of a CDF F(x) is defined by the equation 


n[l — F(u,)] =1 (7.8.13) 


For a random sample of size n from F(x), the expected number of observations 
that will exceed u, is 1. The probability that one observation will exceed u, is 


p=P[X >u,] =1-— F(u,) 


and the expected number for n independent observations is 


np = n[1 — F(u,)] 
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Theorem 7.8.5 Let X ~ F(x), and assume that Y, = (X,:n — 0,)/a, has a limiting distribution. 


1. If F(x) is continuous and strictly increasing, then the limiting distribution 
of Y, is of exponential type if and only if 


lim n[1 — F(a,y + b,)] =e7” —0 <y< oo (7.8.14) 


where b, = u, and a, is the solution of 
F(a, +-u,) = 1— (ne)7} 
2. G(y) is of Cauchy type if and only if 


eel me SEC) gy 
ti TR) k>0,y>0 (7.8.15) 


and in this case, a, =u, and b, = 0. 
3. G(y) is of limited type if and only if 


_ 1 — Flky + Xo) 
lim ——2 7% = ky? k>0 7.8.16 
Boa 1 — Fly + x) ( ) 


where x9 = max {x| F(x) < 1}, the upper limit of x. Also b, = x9 and 
— Xo — Uy. 


Example 7.8.1 Suppose again that X ~ EXP(6), and we are interested in the maximum of a 
random sample of size n. The characteristic largest value u, is obtained from 


n{i — F(u,)] =n[t — (U1 —e7""%)] = 1 
which gives 
u, = Oinn 


We happen to know that the exponential density falls in the Type I case, so we 
will try that case first. We have b, = u, = @ In n, and a, is determined from 


F(a, + u,) = 1 — e7 tO lnm = 1 — (1fnje a? 
= 1 — 1/(ne) 


which gives a, = 0. 
Thus, if the exponential density is in the Type I case, we know that 
p. Gane = 0 In Na 


¥, = SY ~ GM) (7.8.17) 


Example 7.8.2 


Example 7.8.3 
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This is verified easily by using condition 1 of Theorem 7.8.5, because 
lim n[1 — F(a, y +.b,)] = lim n[e~?7"”] 
n> n= CO 
= lime’ 
n~7>o 


=e? —-a<y<a 


The density for the CDF F(x) =1—x~°®, x >1, has a thick upper tail, so one 
would expect the limiting distribution to be of Cauchy type. If we check the 
Cauchy-type condition given in Theorem 7.8.5, we find 


ae a 6) eee aad 
lim —— = lim —— =k? 
yoo 1 F(ky) 2 (ky) 
so the limiting distribution is of Cauchy type with y= @. Also, we have 


ai — F(u,)] = nuy°=1, which gives u, =n‘? =a,, and we let b, =0 in this 
case. Thus we know that 


(7.8.18) 


Xx, 
Y,= i Se ge Gy) (7.8.19) 


Now that we know how to standardize the variable, we also can verify this 


result directly by Theorem 7.8.4. We have 
lim n[1 — F(a, y + b,)] = lim y~? = —In Gy) (7.8.20) 


so G(y) = exp (—y~), which is the Cauchy type with y = @. 


For X ~ UNIF(0, 1), where F(x) = x, 0<x< 1, we should expect a Type Hl 
limiting distribution. We have 
ati — F(u,)] = nl —u,) = 1 


which gives u, = 1 —1/n. Thus, b, =x9 =1 and a, = Xo — u, = 1/n. Checking 
condition 3 of Theorem 7.8.5, 


: 1 —'F(ky + Xo) . L—(ky + Xo) _ ky 
lim ——=—— = lim ——>——* = lim = =k 
yo 1 — Fly + Xo) yoo 1 —(y.+ Xo) yoo" y 


so the limiting distribution of Y, = n(X,,.,, — 1) is Type III with y = 1. Again, if we 
look directly at Theorem 7.8.4 to further illustrate, we have 


lim n[i — F(a, y + b,)] = lim E aa € oy i)| 


Sy 
—In G(y) (7.8.21) 


I 
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and Y, = n(X,., —1)-> Y ~ G(y), where 
e y<0 


Gy) = ay) = {* oe 


LIMITING DISTRIBUTIONS OF MINIMUMS 


If.a nondegenerate limiting distribution exists for:the minimum of a random 
sample, then it also will be one of three possible types. Indeed, the distribution of 
a minimum can be related to the distribution of a maximum, because 


min (x;,..., X,) = —max (—x,,..., —X,) (7.8.22) 


Thus, all the results on maximums can be modified to apply to minimums if the 
details can be sorted out. 

Let X be continuous, X ~ F(x), and let Z = —X ~ F,(z) = 1 — F,(—z). Note 
also that X,., = —Z,.,- : 

Now consider W, = (X,.,, + 5,)/a,. We have 


Xan, tb 
Gy,{w) = P | Zante < »| 
= | =A + b,, < w| 
a, 
[2 —b, | 
Gp, 

= PLY, 2 —w] 
= 1-—G,(—w) 


The limiting distribution of W,, say H(w), then is given by 
A(w) = lim Gy,(w) = lim [1 — Gy,(—w)] 


=1-G(-w) 
where G(y) now denotes the limiting distribution of Y, = (Z,., — b,)/a,. Thus to 
find H(w), the limiting distribution for a minimum, the first step is to determine 
F(z) = 1 — F,(—z) 


then determine a,, b,, and the limiting distribution G(y) by the methods 
described for maximums as applied to F(z). Then the limiting distribution for W, 
is 


H(w) = 1— G(—w) (7.8.23) 
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Note that if Fy(x) belongs to one limiting type, it is possible that F(z) will 
belong to a different type. For example, maximums from EXP(@) have a Type I 
limiting distribution, whereas F(z) in this case has a Type III limiting distribu- 
tion, so the limiting distribution of the minimum will be a transformed Type III 
distribution. 

In summary, a straightforward procedure for determining a,, b,, and H(w) is 
first to find F(z) and apply the methods for maximums to determine G(y) for 
¥, =(Z,:, — ,)/a,, and then.to use equation (7.8.23) to obtain H(w). It also is 
possible to express the results directly in terms of the original distribution F(x). — 


Definition 7.8.2 
The smallest characteristic value is the value s,, defined by 


nF(s,) =1 (7.8.24) 


It follows from equation (7.8.22) that s,(x) = —u,(z). Similarly, the condition 
Fi(a, +.u,(2)) = 1—1/(ne) becomes F,(—a, + s,) = 1/(ne), and so on. 


Theorem 7.8.6. If W, =(X,., +.5,)/4, has a limiting distribution H(w), then H(w) must be one of 
the following three types of extreme-value distributions: 


1. Type I: (for minimums) (Exponential type) 


In this case, b, = —s,, a, is defined by 
eee eee W, = Stet Se 
ne a, 
and 
HY (w) = 1 — G(—w) = 1 —exp(—e") —ao<w<o 


if and only if lim nF(—a,y+s,)=e’. 
2. Type II (for minimums) (Cauchy type) 
In this case, a, = —s,, b, =0, W, = —X,,,/s,, and 
HY(w) = 1— G°(—w) = 1 — exp [—(—w)7”] w<0,y>0 
if and only if 


en Gas) eee 
ea k>0,y>0 


or 


lim nF(s, y) = y~” y>O0 
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3, Type UI (for minimums) (Limited type) 
If x, = min {x| F(x) >0} denotes the lower limit for x (that is, 
X, = —Xpq), then 


Aigo % 


b, = —x a, = —X, +S, W, = 
n 1 n 1 n Tn S,—X, 


and 
HY (w) = 1 — G2(—w) = 1 — exp (—w”) w>d0,y>0 
if and only if 


=k? k>0 
“y70 F(y + x;) 


or 


lim nF[(x, — s,)y + x4] =(—y)’ 


m-? C 


Note that the Type I distribution for minimums is known as the Type I 
extreme-value distribution. Also, the Type III distribution for minimums is a 
Weibull distribution. Recall that the limiting distribution for maximums is Type I 
for many of the common densities. In determining the type of limiting distribu- 
tion of the minimum, it is necessary to consider the thickness of the right-hand 
tail of F(z), where Z =.— X. Thus the limiting distribution of the minimum for 
some of these common densities, such as the exponential and gamma, belongs to 
Type Hi. This may be one reason that the Weibull distribution often is encoun- 
tered in applications. 


We now consider the minimum of a random sample of size n from EXP(@). We 
already know in this case that X,,,~ EXP(6/n), and so nX,,,/0 ~ EXP(1). 
Thus the limiting distribution of nX,,,/0 is also EXP(1), which is the Type III 
case with y = 1. If we did not know the answer, then we would guess that the 
limiting distribution was Type III, because the range of the variable Z = —X is 
limited on the right. Checking condition 3 in Theorem 7.8.6, we have x, = O and 


i Flky + x1) _ Kk 1 ~ exp (—ky) _ lim k exp (—ky) _, 
yro+ Fly+x,)  yro+ 1—exp(—y) yor exp (—y) 


Thus, we know that Hy(w) = 1 — e~”, where 


xX —xX xX 
: 1 : 

W,= 4S al 

Sy Xy Si 
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In this case, s, is given by 


FG) 21 Sera) or Ss, = —6@ In (:-2) 
n n 


This does not yield identically the same standardizing constant as suggested 
earlier; however, the results are consistent because 


—In (1 — 1/n) i 
1/n a 


SUMMARY 


The purpose of this chapter was to introduce and develop the notions of con- 
vergence in distribution, limiting distributions, and convergence in probability. 
These concepts are important in studying the asymptotic behavior of sequences 
of random variables and ‘their distributions. 

The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) deal 
with the limiting behavior of certain functions of the sample mean as the sample 
size approaches infinity. Specifically, the LLN asserts that a sequence of sample 
means converges stochastically to the population mean under certain mild condi- 
tions. This type of convergence is also equivalent to convergence in probability in 
this case, because the limit is constant. Under certain conditions, the CLT asserts 
that a suitably transformed sequence of sample means has a normal limiting 
distribution. These theorems have important theoretical implications in probabil- 
ity and statistics, and they also provide useful approximations in many applied 
situations. For example, the CLT yields a very good approximation for the bino- 
mial distribution. 


EXERCISES 


Consider a random sample of size n from a distribution with CDF F(x) = 1 — 1/x if 
1 <x < oo, and zero otherwise. 

(a) Derive the CDF of the smallest order statistic, X,.,. 

(b) Find the limiting distribution of X a 

(c) Find the limiting distribution of X%,,. 


Consider a random sample of size n from a distribution with CDF F(x) = (1 + e~*)~! for 
all real x. 

(a).Does the largest order statistic, X,,,,, have a limiting distribution? 

(b) Does X,,,, — In n have a limiting distribution? If so, what is it? 
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Consider a random sample of size n from a distribution with CDF. F(x) = 1 — x7? if 
x > 1, and zero otherwise. Determine whether each of the following sequences has a 
limiting distribution; if so, then give the limiting distribution. 

(a) Xin : 

(b) Xan 

(0) 2 PX 9. 


Let X,, X,,... be independent Bernoulli random variables, X,; ~ BIN(1, p,), and let 


Y, = >’ (X; — p,)/n. Show that the sequence Y,, Y,,... converges stochastically to.c =.0.as 
i=1 


n— oo. Hint: Use the Chebychev inequality. 
Suppose that Z; ~ N(0, 1) and that Z,, Z,,... are independent. Use moment generating 


functions to find the limiting distribution of }\(Z; + 1/n)/./n ; 

i=1 
Show that the limit in equation (7.3.2) is still correct if the assumption np = py is replaced 
by the weaker assumption np > yas n> 00. 


Consider a random sample from a Weibull distribution, X; ~ WEI(1, 2). Find approximate 
values a and.b.such that for n =35: 

(a) Pla< X <b] = 0.95. 

(b) Pla < X <b] = 0.95, where ¥ = X,9.35 is the sample median. 
In Exercise 2 of Chapter 5, a carton contains 144 baseballs, each of which has a mean 
weight of 5 ounces and a standard deviation of 2/5 ounces. Use the Central Limit 


Theorem to approximate the probability that the total weight of the baseballs in the 
carton is'‘a maximum of 725 ounces. 


Let X,, X2,....,.X199 be a random sample from an exponential distribution, 
X,~ EXP(1), andlet Y=X,+X,4+::°+Xjoo- 

(a) Give an approximation for P[Y > 110]. 

(b) If X is the sample mean, then approximate P[1.1 < X < 1.2]. 


Assume X, ~ GAM(l, n) and let Z, = (X, — n)/,/n. Show that Z, > Z ~ N(0, 1). Hint: 


Show that M; (t) = exp (—Jnt —In(i- t/,/n)) and then use the expansion 
In (i — s) = —s—(1 + &s?/2 where € > 0 as s > 0. Does the above limiting distribution 
also follow as a result of the CLT? Explain your answer. 


Let X, ~ UNIF(0, 1), where X,, X,,..., X29 are independent. Find normal 
approximations for each of the following: 


20 
(a) a pe aes 12} 
a 20 
(b). The 90th percentile of }° X;. 
i=1 


A certain type of weapon has probability p of working successfully. We test n weapons, 
and the stockpile is replaced if the number of failures, X, is. at least one. How large must n 
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be to.have PLX > 1] = 0.99 when p = 0.95? 

(a) Use exact binomial. 

(b) Use normal approximation. 

(c) Use Poisson approximation. 

(d) Rework (a) through (c) with p =:0.90. 
Suppose that ¥, ~ NB(n, p). Give a normal approximation for PLY, < y] for large n. Hint: 
Y, is distributed as the sum of n independent geometric random variables. 


For the sequence Y, of Exercise 13: 
(a) Show that ¥,/n converges stochastically to 1/p, using Theorem 7.6.2. 
(b). Rework (a) using Theorem 7.6.3. 


Let W, be the weight of the ith airline passenger’s luggage. Assume that the weights are 
independent, each with pdf 


S(w) = 6B wf O<we<B 
and zero otherwise. 


100 
(a) For n = 100, 6 = 3, and B= 80, approximate af YW> cons] 
i=t 
(b) If W,,, is the smallest value out of n, then show that W,,, 4,0asn— 0. 


(c) If W,,, is the largest value out of n, then show that W,,, > Basn— oo. 

(d) Find the limiting distribution of (W,,,/B)". 

(e) Find the asymptotic normal distribution of the median, W,,,,, where k/n > 0.5 with 
k — 0.5n-bounded. 

(f) To what does W,,,, of (e) converge stochastically? 

(g) What is the limiting distribution of n/°W,,,/B? 


Consider a random sample from a Poisson distribution, X; ~ POI(y). 
(a) Show that Y, = e~** converges stochastically to P[X = 0] =e™. 
(b) Find the asymptotic normal distribution of Y,. 
(c) Show that X, exp (—X,) converges stochastically to PLX = 1] = ye~*. 


Let X,, X2,..., X, be a random sample of size n from a normal distribution, 
X;,~ N(u, o?), and let X,, be the sample median. Find constants m and c such that X, is 


asymptotically normal N (m, c?/n). 
In Exercise 1, find the limiting distribution of n In X,,,. 
In Exercise 2, find the limiting distribution. of (1/n) exp (X,,.,). 


Under the assumptions of Theorem 7.5.1: 
(a) Show that X,,,, converges stochastically to x,. 


(b) Show that F(X,,,) > p if F(x) is continuous. 


As noted in the chapter, convergence of a sequence of real numbers to a limit can be 
regarded as a special case of stochastic convergence. That is, if PLY, = c,] = 1 for each n, 
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and c, > c, then Y, +, c. Use this along with Theorem 7.7.3 to show that if a, a,b, > b 
and Y,4c, then a, + bY oat be. 
Consider the sequence of independent Bernouili variables in Exercise 4. Show that if 


¥ p/n >lasn— oo, then ¥ X/nS 1. 
i= i=1 


For:the sequence of random variables Z, of Exercise 10, if ¥, is another sequence such that 


Y, *,c and if W, = Y,Z, does W, have a limiting distribution? If so, what is it? What is the 
limiting distribution of Z,/Y,? 


Use the normal approximation to work Exercise 6 of Chapter 3. 


Use the theorems of Section 7.8 to determine the standardizing constants and the 
appropriate extreme-value distributions for each of the following: 


(a) X,,, and X,,, where F(x) =(1 + e7)71. 
(b) X,,, and X,,, where X; ~ WEI(6, A). 

(c) X,,, and X,,,, where X; ~ EV(6, 7). 

(d) X,;, and X,.; where X, ~ PAR(@, k). 


Consider the CDF of Exercise 1. Find the limiting extreme-value distribution of X,,,, and 
compare this result to the results of Exercise 1. 


Consider a random sample from a gamma distribution, X; ~.GAM(0, x). Determine the 
limiting extreme-value distribution of X,.,. 


Consider a random sample from a Cauchy distribution, X; ~ CAU(1, 0). Determine the 
type of limiting extreme-value distribution of X,,,. 


STATISTICS 
AND SAMPLING 
DISTRIBUTIONS 


8.1 


INTRODUCTION 


In Chapter 4, the notion of random sampling was presented. The empirical dis- 
tribution function was used to provide a rationale for the sample mean and 
sample variance as intuitive estimates of the mean and variance of the population 
distribution. The purpose of this chapter is to introduce the concept of a statistic, 
which includes the sample mean and sample variance as special cases, and to 
derive properties of certain statistics that play an important role in later chapters. 


8.2 


STATISTICS 


Consider a set of observable random variables X,,..., X,. For example, suppose 
the variables are a random sample of size n from a population. 
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Definition 8.2.9 


A function of observable random variables, T = ¢(X,, ..., X,), which does not 
depend on any unknown parameters, is called a statistic. 


In our notation, script ¢ is the function that we apply to X,, ..., X, to define the 
statistic, which is denoted by capital T. 

It is required that the variables be observable because of the intended use of 
a statistic. The intent is to make inferences about the distribution of the set 
of random variables, and if the variables are not observable or if the function 
4(x4, ..., X,) depends on unknown parameters, then T would not be useful in 
making such inferences. For example, consider the data of Example 4.6.3, which 
were obtained by observing the lifetimes of 40 randomly selected electrical parts. 
It is reasonable to assume that they are the observed values of a random sample 
of size 40 from the population of all such parts. Typically, such a population will 
have one or more unknown parameters, such as an unknown population mean, 
say uw. To make an inference about the population, suppose it is necessary to 
numerically evaluate some function of the data that also depends on the 
unknown parameter, such as ¢(x,..., X40) = (1 + °°* + X40)/H OF Xy.49 — &. OF 
course, such computations would be impossible, because ys is unknown, and these 
functions would not be suitable for defining statistics. 

Also note that, in general, the set of observable random variables need not be a 


random sample. For example, the set of ordered random variables Y,, ..., Yio 
of Example 6.5.6 is not a random sample. However, a function of these vari- 
ables that does not depend on unknown parameters, such as ¢(y,, ..., 10) 


= (yy +°°+ + Yio) + 3yy9, would be a statistic. 
Most of the discussion in the chapters that follow will involve random samples. 


Let X,,..., X, represent a random sample from a population with pdf f(x). The 
sample mean, as defined in Chapter 4, provides an example of a statistic with the 
function ¢(x1,..., X,) = (x, ++: +.x,)/n. 

This statistic usually is denoted by 


ay: “ (8.2.1) 
i= 


When a random sample is observed, the value of X, computed from the data, 
usually is denoted by lower case X. As noted in Chapter 4, X is useful as an 
estimate of the population mean, wp = E(X). 


The following theorem provides important properties of the sample mean. 


Theorem 8.2.7 


Exampie 8.2.2 
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If X,, ..., X, denotes a random sample from f(x). with E(X)=y and 
Var(X) = o?, then 
E(X) =u (8.2.2) 
and 
sn ie 
Var(X) = 7 (8.2.3) 


Property (8.2.2) indicates that if the sample mean is used to estimate the popu- 
lation.mean, then. the. values of sample estimates will, on the average, be the 
population mean y. Of course, for any one sample the value * may differ substan- 
tially from yz. A statistic with this property. is said to be unbiased for the param- 


eter it is intended to estimate. This property will receive more attention in the 


next chapter. 
An important special case of the theorem occurs when the population distribu- 


tion is Bernoulli. 


Consider the random variables X,,..., X, of Example 5.2.1, which we can regard 
as.a random sample. of size froma Bernoulli distribution, X,; ~ BIN(1, p). The 
Bernoulli distribution provides a model for a dichotomous or two-valued popu- 
lation. The mean and variance of such a population are 4 =p and o? = pq, 
respectively, where, as usual, g = 1 — p. The sample mean in this case is X = Y/n, 
where Y is the binomial variable of Example 5.2.1, and it usually is called the 
sample proportion, denoted p = Y/n. 
It is rather straightforward to show that p is an unbiased estimate of p, 


E(p) = p (8.2.4) 
and that 
Var(p) = a (8.2.5) 


As noted earlier, the binomial distribution provides a model for the situation of 
sampling with replacement. In Example 5.2.2, the comparable result for sampling 
without replacement was considered, with Y ~ HYP(n, M, N). 

In that example, suppose that we want to estimate M/N, the proportion of 
defective components in the population, based on the sample proportion, Y/n. 
We know the mean and variance of Y from equations (5.2.20) and (5.2.21). Spe- 


cifically, 
(2) a ass (8.2.6) 


which means that Y/n is unbiased for p, and Var(Y/n) is shown easily to 
approach zero as n increases. Actually, in this example, it is possible for the 
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variance to attain the value 0 if n = N, which means that the entire population 
has been inspected. 


The function ¢(x;, ..., x,) = [(xy — x)? + +++ + (x, — X)?]/(n — 1) when applied 
to data corresponds to the sample variance which was discussed in Chapter 4. 
Specifically, the sample variance is given by 


L- ¥P 


= (8.2.7) 


n—J 


The following alternate forms may be obtained by expanding the square: 


(8.2.8) 


= St___ (8.2.9) 
The following theorem provides important properties of the sample variance. 


If X,,...., X, denotes a random sample of size n-from f(x) with E(X) = p, 
Var(X) = o?, then 


E(S?) = o? (8.2.10) 


Var(S?) = (14 oe : on n>1 (8.2.71) 
n- 


Proof 


Consider property (8.2.10). Based on equation (8.2.9), we have 


as?) = Bf x7 — nk? [on 1) 
i=4 


en [ 2x7 — ner] 


n-1 
ee 
= [mu + ot) a(t + =| 
— I 1 2 
ge sropere foes )o*] 


The proof of equation (8.2.11) is omitted. 
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According to property (8.2.10), the sample variance provides another example 
of an unbiased statistic, and this is the principal reason for using the divisor n — 1 
rather than n. 


SAMPLING DISTRIBUTIONS 


Theorem 8.3.7 


Corollary 8.3.7 


Example 8.3.1 


A statistic is also a random variable, the distribution of which depends on the 
distribution of a random sample and on the form of the function ¢(x,, x2, ..., X,). 
The distribution of a statistic sometimes is referred to as a derived distribution or 
sampling distribution, in contrast to the population distribution. 

Many important statistics can be expressed as linear combinations of indepen- 
dent normal random variables. 


LINEAR COMBINATIONS OF NORMAL VARIABLES 


If X; ~ N(u;, 07);i = 1, ..., n denote independent normal variables, then 


Y= Baixki~N( 5 a, Sate? (8.3.1) 
t=1 i=1 


i=1 
Proof 
Mo = I] Mx({a;t) 
i=1 . 


on 
ies [J etme tateeti2 
ist 


n n 
= exp [eau + PS ater] 2] 
i=1 i=4 


which is the MGF of a normal variable with mean ¥) a, 4, and variance }° a?o?. 


If X,,..., X, denotes a random sample from N(, 07), then X ~ N(u,07/n). = 
In the situation of Example 3.3.4, we wish to investigate the claim that 


X ~ N(60, 36), so 25 batteries are life-tested and the average of the survival times 
of the 25 batteries is computed. If the claim is true, the average life of 25 batteries 
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should exceed what value 95% of the time? We have E(X) = 60, Var(X) = 36/25, 


and 
Lg (co — 60 
P[X>c]=1-0 
ee ( eB ) 
= 0.95 
~ 60 
50 aeidirgs 64S andes SE Od6nonthS, 


6/5 
In general, for a prescribed probability level 1 — «, one would have 


Z,0 


SAAR 


or in terms of percentiles directly available in Table 3 (Appendix C) for small a, 
one could write ; 


(8.3.2) 


2,49 
Ch (8.3.3) 


on 

Thus a reasonable procedure would be to accept the claim if the observed 
X 2 58.026, but to disbelieve or reject the claim if X < 58.026, because that should 
happen with very small probability (less than 0.05) if the claim is true. If one 
wished to be more certain before rejecting, then a smaller a, say « = 0.01, could 
be used to determine the critical value c. This test procedure favors the consumer, 
because it does not reject when a large mean life is indicated. An appropriate test 
for the other direction (or both directions) also could be constructed. 


Consider two independent random samples X,, X2,..., X ao Ald Ye. Veg. Yes 
with respective sample sizes n, and n,, from normally distributed populations, 
X;~ N(u, 7) and Y, ~ N(u2, 03), and denote by ¥ and ¥ the sample means. It 
follows from Theorem 8.3.1 that the difference also is normally distributed, 
X-—Y ~ Nu, — 2, ?/n, + 63/n,). It is clear that the first n, terms of the dif- 
ference have coefficient a,=1/n,, and the last n, terms have coefficient 
a; = —1/n,. Consequently, the mean of the difference is n,(1/n,)u,; + n2(—1/n2)u> 
= Hy — Hz, and the variance is ny(1/n,)?o? + n2(—1/n2)?o3 = 6?/n, + 03/n,. 


Certain additional properties involve a special case of the gamma distribution. 


CHI-SQUARE DISTRIBUTION 


Consider a special gamma distribution with 6 = 2 and « = v/2. The variable Y 
is said to follow a chi-square distribution with v degrees of freedom if 


Theorem 8.3.2 


Theorem 8.3.3 
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Y ~ GAM(2, v/2). A special notation for this is 


Y ~ x7(v) (8.3.4) 
If Y ~ y?(v), then 
My(t) = (1 — 20)7¥/? (8.3.5) 
rior LOV/2.+-7) 
E(Y") =2 “TO (8.3.8) 
E(Y) =v (8.3.7) 
Var(Y) = 2y {8.3.8} 
Proof 


These results follow from the corresponding properties of the gamma distribu- 
tion. 


The cumulative chi-square distribution has been extensively tabulated in the 
literature. In most cases percentiles, 430), are provided for particular y levels of 
interest. and for different values of v. Specifically, if Y ~ y7(v), then y}(v) is the 
value such that 


PLY <7Wl=y (8.3.9) 


Values of x3(v) are provided in Table 4 (Appendix C) for various values of y 
and y. These values also can be used to obtain percentiles for the garnma dis- 


- tribution. 


If X ~ GAM(6, x), then Y = 2X/0 ~ y2(2n). 


Proof 
My(t) = Moyp(t) 
= M,(2t/6) 
= (1 — 21)7 2"? 
which is the MGF of a chi-square distribution with 2x degrees of freedom. @ 
The gamma CDF also can be expressed in terms of the chi-square notation. If 


X ~ GAM(6, x), and if H(y; v) denotes a chi-square CDF with v degrees of 
freedom, then 


F (x) = H(2x/0; 2x) (8.3.10) 
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Cumulative chi-square probabilities H(c; v) are provided in Table 5 (Appendix C) 
for various values of c and v. 


The time to failure (in years) of a certain type of component follows a gamma 
distribution with 6 = 3 and x = 2. It is desired to determine a guarantee period 
for which 90% of the components will survive. That is, the 10th percentile, xo 19, 
is desired such that PLX < x9 19] = 0.10. We find 


P(X < Xo.10] = H(2Xo,10/0; 2x) = 0.10 
Thus setting 


2X0.10 


6 = Xb.10(2K) 

gives 
2 8X6.10(2*) 
*o.10 = a ow 


For @=3 and x = 2, 


2 
4 ; 
Xo.10= 3Xo.10(4) aol ) = cues = 1.59 years 


it is clear in general that the pth percentile of the gamma distribution may be 
expressed as 


2 
xp= wa) = (8.3.11) 


The following theorem states the useful property that the sum of independent 
chi-square variables also is chi-square distributed. 


If ¥, ~ 77(v); i = 1,..., n are independent chi-square variables, then 
V=SVY¥~ ( »y 7) : (8.3.12) 
i=1 . \i=1 a 


Proof 
M/(t) = (1 — 227%"? --- 1 — 227? 
= (1 — 22)-2 2 
which is the MGF of x7()° v)). 
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The following theorem establishes a connection between standard normal and 
chi-square variables. 


Theorem 8.3.5 If Z ~ N(O, 1), then Z? ~ y7(1). 


Proof 


M(t) = Efe'”*] 


Y I ; eft #l2 dz 
-~ao ./20 


sl! = Sh Spero 1 —2t et 20/2 9 
“= ~ 2t Ji. 
= (1 — 24)~ 4/2 


which is the MGF of a chi-square distribution with one degree of freedom. 


Corollary 8.3.2 If X,,..., X, denotes a random sample from N(y, o7), then 


y eee ~ 7(n) (8.3.13) 
i= 
n(X — 2 
= ea ~ 77(1) (8.3.14) 
| 


The sample variance was discussed previously, and for a sample from a normal 
population its distribution can be related to a chi-square distribution. The sam- 
pling distribution of S? does not follow directly from the previous corollary 
because the terms, X;—X, are not independent. Indeed, they are functionally 


dependent because )’ (X, — X) = 
i=1 


Theorem 8.3.6 If X,,..., X, denotes a random sample from N(w, 07), then 


1. X and the terms X; — X;i = 1,..., n are independent. 
2. X and S? are independent. 


3. (n — 1)S?/o? ~ 77(n — 1). (8.3.15) 
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Proof 


To obtain part 1, first note that by adding and subtracting X and then expanding, 
we obtain the relationship 
* (x; — x)? eg Cen hy? 


mo (x; — uy? a 
ee ma 2 


8.3.16 
im. o 


Thus the joint density of X,,..., X,, may be expressed as 


i 1 i— Bh 
Flin) Gags | 7 a) | 


1 1 n 


Now consider the joint transformation 


Yi =X, yp=xX,—x i=2,.0.,n 


We know that 
so 


and 


oy Yn) = ee 
Lo ssto Sn (2n)"/*a" 


x exp \- a2 ( 3 Ly + ya + ny, — mal 


It is easy to see that the Jacobian, J, is a constant, and in particular it can be 
shown that | J| = n. Therefore, the joint density function factors into the marginal 
density function of y, times a function of y,, ..., y, only, which shows that 
Y, = X and the terms ¥, = X;— X for i=2, ..., n are independent. Because 


X,—X = —)(X, — X), it follows that ¥ and X, — X also are independent. 
i=2 . 
Part 2 follows from part 1, because S? is a function only of the X; — X. 
To obtain part 3, consider again equation (8.3.16) applied to the random 
sample, 
(Xi:-)? _(n—1)S?  o(X — wy)? 
ET a rE De 


Kwa) 2 


n 
in. o a 


= V2 + Vs 
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From Corollary 8.3.2, V; ~ 77(n) and V; ~ (1). Also, V, and V, are indepen- 
dent, so 
My,{t) = My,{t)My,(t) 
and 
My (0. 207"? 
M,,(t) (1-201? 
= (1 — 27 @- DP 
Thus, V, = [(n — 1)S?]/o? ~ ?(n — 1). 
Thus, if c, is the yth percentile of the distribution of S?, then 


2424 — 1 
cy = win (8.3.17) 


My,{t) aad 


Consider again Example 8.3.1, where it was assumed that X ~ N(60, 36). 
Suppose that it was decided to sample 25 batteries, and to reject the claim that 
o? = 36 if s? > 54.63, and not reject the claim if s* < 54.63. Under this procedure, 
what would be the probability of rejecting the claim when in fact a? = 36? We 
see that 


P[S? > 54.63] = P[24S?/36 > 36.42] 
= 1 — H(36.42; 24) 
= 0.05 


If instead. one. wished to be-wrong only 1% of the time. when rejecting the 
claim, then the procedure would be to reject if s*>Cco99, where Co99 
= 07 x2 9o(n — 1)/(n — 1) = 36(42.98)/24 = 64.47. 


8.4 


THE t, F, AND BETA DISTRIBUTIONS 


Certain functions of normal samples are very important in statistical analysis of 
populations. 


STUDENT'S t DISTRIBUTION 


We noticed that S? can be used to make inferences about the parameter a? in a 
normal distribution. Similarly, X is useful concerning the parameter yu; however, 
the distribution of X also depends on the parameter o”. This makes it impossible 
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to use X for certain types of statistical procedures concerning the mean when o? 
is unknown. It turns out that if o is replaced by S in the quantity Jn (X — y/o, 
then the resulting distribution is no longer standard normal, but it does not 
depend on o, and it can be derived using transformation methods. 


If Z ~ N(Q, 1) and V ~ y*(v), and if Z and V are independent, then the distribu- 
tion of 


T= (8.4.1) 
V/v 


is referred to as Student’s ¢ distribution with v degrees of freedom, denoted by 
T ~ t(v). The pdf is given by 


yv+1 
r 2 ) 1 ( -\o 
Pie oe (8.4.2) 
fe) 0) om she 
2 


Proof 

The joint density of Z and V is given by 

pl2-1e-v2_-22/2 
J2nT(v/2)2"? 


Consider the transformation T = Z/./V/v, W = V, with inverse transformation 
v= w,z =t./w/v. The Jacobian is J = ./w/v and 


0O<v< 0, -W<z2<@ 


Sz, Az, 0) = 


(w/v)¥/2 wv? —1e~ wi2e—ewi2y 


/2n V(v/2)2"? 


Sr, w(t w) = —-0<t<0,0<w<0 


After some simplification, the marginal pdf f(t; v) = [ Sr, w(t, w) dw yields equa- 
f) 
tion (8.4.2). | 


The ¢ distribution is symmetric about zero, and its general shape is similar to 
that of the standard normal distribution. Indeed, the ¢ distribution approaches 
the standard normal distribution as y > co. For smaller v the ¢ distribution is 
flatter with thicker tails and, in fact, T ~ CAU(1,.0). when v = 1. 

Percentiles, t,(v), are provided in Table 6 (Appendix C) for selected values of y 
and for y = 1,...,.30, 40,60, 120, o. 
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Theorem 8.4.2 If T ~ t(v), then for v > 2r, 


E(T?") = T((2r + 1/2) (v — 27)/2) Fr (8.4.3) 


T(i/2)P(v/2) 
E(T*-)=0 r=i,2,.°. (8.4.4) 
v 
Var(T) = y=) 2<yv (8.4.5) 


Proof 


The 2rth moment is 
E(T™) = E(Z*)E[(V/v)-"] 


where Z ~ N(O, 1) and V ~ y7(v). Substitution of the normal and chi-square 
moments gives the required result. 


As suggested earlier, one application of the ¢ distribution arises when sampling 
from a normal distribution, as illustrated by the following theorem. 


Theorem 8.4.3 If X,,..., X, denotes a random sample from N(y, a7), then 


~tn— 1) (8.4.8) 


4 


Proof 


This follows from Theorem 8.4.1, because Z = Ja (X — w/o ~ N(O, 1) and, by 
Theorem 8.3.6, V = (n — 1)S?/a? ~ y?(n — 1), and X and S? are independent. B 


SNEDECOR’S F DISTRIBUTION 


Another derived distribution of great importance in statistics is called Snedecor’s 
F distribution. 


Theorem 8.4.4 If V, ~ x7(v,) and V, ~ x?(v,) are independent, then the random variable 


—= LATAR (8.4.7) 


V2/V2 


276 CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS 


has the following pdf for x > 0: 


Vi t+ V2 
Tl ——— 
2 ) 


A(X; V4, V2) =& 
V4 V2 
r(3)r(3) 


This is known as Snedecor’s F distribution with v, and v, degrees of freedom, 
and is denoted by X ~ F(v,, v.). Some authors use the notation F rather than X¥ 
for the ratio (8.4.7). 

The pdf (8.4.8) can be derived in a manner similar to that of the t distribution 
as in Theorem 8.4.1. : 


vi /2 —(v1 + ¥2)/2 
M4 x l2— 1h gg bl x (8.4.8) 
V2 V2 


Theorem 8.4.5 If X ~ F(v,, v2), then 


v, > 2r 8.4.9 
= v1 V2 2 ( ) 
2 2y° 
ge 
E(X) = SEE, 2<v, (8.4.10) 
2 _ 
Var(X) = SVE Ya = 2) 4<v2 (8.4.11) 


vi(v2 — 2)*(v2 — 4) 


Proof 


These results follow from the fact that V,; and: V, are independent, and from 
chi-square moments (8.3.6). Specifically, they can be obtained from 


E(X") = (2) enews ) (8.4.12) 


Percentiles f,(v,, v2) of X ~ F(v,, v2) such that 
PLX < fi, v2) = (8.4.13) 


are provided in Table 7 (Appendix C) for selected values of », v,, and v,. Percen- 
tiles for small values of y can be obtained by using the fact that if X ~ F(v,, v3), 
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then Y = 1/X ~ F(v., v,). Thus, 
is = P[X < fi-"1, v2)] 


1 
a aly . fi-1, 5 


1 
=1—P)| Y <——— 
| = fi-n 5 


so that 
es aT) 
Si-O v2) ees 


1 
fi-yr V2) = Fir, 1) | (8.4.14) 


Let X,,..., X,, and ¥,,..., ¥,, be independent random samples from popu- 
lations with respective distributions X;~N(#,, 97) and Y,~N(., 0%). If 
vy, =n, —1 and v, =n, —1, then v, S}/o} ~ x7(v,) and v2 S2/03 ~ x7(v2), so 
that 

Si 63 

536% 
and thus 

§? 2 
a a < fo.os(Y1, va | = 0.95 


S30 


et FO, V2) 


and 


Si “i] 
P| —————- $ 5 |] = 0.95 
| Sn v2) 03 


If nj = 16 and n, = 21, then fo .95(15, 20) = 2.20, and for two such samples it 
usually is said that we are 95% “confident” that the ratio 07/03 
> s?/[s3fo.9s(15, 20)]. This notion will be further developed in a later chapter. 


BETA DISTRIBUTION 


An F variable can be transformed to have the beta distribution. If X ~ F(v,, v2) 
then the random variable 


ee (v,/va)X 
= Te OTK (8.4.15) 
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has the pdf 


. _ T(a+d) 
S(y; 4, b) re T(a)l(b) y 


aL yPrh  O<y<l (8.4.16) 


where a = v,/2 and b = y2/2. This pdf defines the beta distribution with param- 
etersa >0Q and b> 0, denoted Y ~ BETA(a, b). 
The mean and variance of Y easily are shown to be 


a 


= Fad (8.4.17) 


E(Y) 


ab 
Var( Y) = ————_____~ (8.4.18) 
aE ae be ED 
The yth percentile of a beta distribution can be expressed in terms of a percen- 
tile of the F distribution as a result of equation (8.4.15), namely 


2a, 2b 
y,(a, b) = eee). (8.4.19) 


b + af,(2a, 2b) 

If a and b are positive integers, then successive integration by parts leads to 
a.relationship between the CDFs of beta and binomial distributions. If 
X ~ BIN(n, p) and Y ~ BETA(n —i +4 1, 3), then Fy(i — 1) = F¥(1 — p). 

The beta distribution arises in connection with distributions of order statistics. 
For a continuous random variable X ~ f(x), the pdf of the kth order statistic 
from a random sample of size n is given by 


! 
Bilen = GS piq api LF ed ALL — Fea Cn 


Making the change of variable U,,,, = F(X,,.,) gives 
U,,,, ~ BETA(k, n — k + 1) 


Because U = F(X) ~ UNIF(0, 1), it also follows that U,., represents the kth 
largest ordered uniform random variable. The CDF of X,.,, can be expressed in 
terms of a beta CDF, because 


Gi(Xpin) = PX en S Xen] 
= PLF(Xg:n) < F(Xben)] 
= AF (Xin); k,n — k + 1) 
where H(y; a, b) denotes the CDF of Y ~ BETA(a, b). 


Suppose that X ~ EXP(#), and one wishes to compute probabilities concerning 
X ign We have 


F(x) =1—e77? —— Ugsg = F(Xy,) ~ BETA(K, 1 — ke + 1) 
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and 
PUX gn SC] = PL (Xin) < FO] 
= Pin < Fo] 
ss a (n—k+ Uy, (n — k + 1)F(o) | 
KL Ug) 2 kL FE) 


where this last probability involves a variable distributed as F(2k, 2(n — k + 1)). 
Thus for specified values of 8, c, k, and n, this probability can be obtained from a 
cumulative beta table, or from a cumulative F table if the proper « level is avail- 
able. For the purpose of illustration, we wish to know c, such that 


PX iin < cy] =) 


then 
__ (a—k+1F(,) 
SF (2k, An —k + I= Kl — F(,)] 
ek + 1)(1 — exp (—c,/8)) 
e k exp (—c,/8) 
and 


(8.4.20) 


k f,(2k, 2n —k + 1) 
n—-k+1 


c=8In|1 + 


Ifn-= 1i,k-= 6, and y = 0.95, then 
c,=@I1n E + a8) = 1.310 


and 
PUX 6:14 < 1.316] = 0.95 


or 


where X,.,, is the median of the sample, and @ is the mean of the population. 


We have defined the beta distribution and we have seen its relationship to the 
F distribution and the binomial CDF, as well as its application to the distribu- 
tion of ordered uniform random variables. The beta distribution represents a 
generalization of the uniform distribution, and provides a rather flexible two- 
parameter model for various types of variables that must lie between 0 and 1. 
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8.5 


-LARGE-SAMPLE APPROXIMATIONS 


The sampling distributions discussed in the earlier sections have approximations 
that apply for large sample sizes. 


Theorem 8.5.1 If Y, ~ x7(v), then 


Yy— 
Z,=2—4Z~NO,1) 
af av 
asv— @. 
Proof . 


v 


This follows from the CLT, because Y, is distributed as a sum, ny X;, where 


i=1 


X,,..., X, are independent, and X; ~ y?(1), so that E(X,;) = land Var(X) = 2. & 


We also would expect the pdf’s of Z, to closely approximate the pdf of Z for 
large v. This is illustrated in Figure 8.1, which shows the pdf’s for v = 20, 80, and 
200. 


FIGURE 8.7 Comparison of pdf's of standardized chi-square and standard normal distributions 
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It follows that chi-square percentiles can be approximated in terms of standard 
normal percentiles for large v. Specifically, 


y= PLY, <x70)] 


2 o(82=) 
» of te 


so that 
HWM =¥ 
a/2v < 
4G) =v + 2,,/2v (8.5.1) 


For example, for v = 30 and y = 0.95, 
73.95(30) + 30 + 1.645,/60 = 42.74 


compared to the exact value y2..;(30) = 43.77. A more accurate approximation, 
known as the Wilson-Hilferty approximation, is given by 


° 3 
7G) = yf —- é +2, [2 | (8.5.2) 


This gives approximate values of y?(v)/y within 0.01 of exact values for 
v>3 and 0.01 <y < 0.99. For example, if v = 30 and y = 0.95, approximation 
(8.5.2) gives y2.95(30) = 43.77, which agrees to two decimal places with the exact 
value. 

It also is possible to derive asymptotic normal distributions directly for S$? 


and S,,. 


Let S? denote the sample variance from a random sample of size n from N(u, 0”). 
We know that 


(n= 1S; 
o 


V,= ~ x(n — 1) 


and from Theorem 8.5.1, 
V,-(n-1)a 
— 
af 2(n — 1) 
That is, 
sh 1 S§2 — 42 
vn-ilS,- ola, rer 
a? /2 


Z~N(0, 1) 
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or approximately, 
2a* 
2 2 
S? ~ No oF 7) (8.5.4) 


If ¥,=S?, and g(y) =./y, then g'(y) = 1/2,/y, g(a?) = 1/20, and approx- 
imately, 


S.aN az 
n™ In —1) (8.5.5) 


It also is possible to show that a t-distributed variable has a limiting standard 
normal distribution as the degrees of freedom v increases. To see this, consider a 
variable T, ~ t(v), where 


ra 

Jaly 

We know that E(y?/v) = 1, Var(y2/v) = 2/v, and by Chebychev’s inequality, 
Pliy2/v — 1| < €] > 1 — 2/ve?, so that 72/v 5 Las v— 00. 


Thus Student’s ¢ distribution has a limiting standard normal distribution by 
Theorem 7.7.4, part 3: 


T, = 


Zz d : 
T= => Z~ N(0, 1) (8.5.6) 
[x31 


Vv 


Comparison of pdf's of t and standard normal distributions 
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This is illustrated by Figure 8.2, which shows the pdf’s of N(0, 1) and t(v) for 
v= 1, 3, and 10. 

This suggests that the ¢ percentile, t,(v), is approximately equal to z, for large v, 
and it leads to the last row of values in Table 6 (Appendix C) that correspond to 
the standard normal percentiles. 

A similar rationale yields approximate percentiles for an F distribution. 
Suppose X,,,. =(Vi/v,)(V2/v2). As noted in the above discussion, V/v, > 1 as 
v2 + oo. Thus, if v, is kept fixed, X,,_,, 4, Y,/v, as vz ~ 00. The resulting approx- 
imation for an F percentile is f,(v,, v2) = x}(v,)/v, for large v.. A similar argu- 
ment leads to an approximation for large v,, namely, f,(v,, v2) = v2/75 -y(V2)- 
These also provide the limiting entries in Table 7 (Appendix C). 


SUMMARY 


Our. purpose in this chapter was to study properties of the norma! distribution 
and to derive other related. distributions. that arise in.the statistical analysis of 
data from normally distributed populations. 

An important property of the normal distribution is that linear combinations 
of independent normal random variables are also.normally distributed, which 
means, among other things, that the sample mean is normally distributed. A 
certain function of the sample variance is shown to be chi-square distributed, and 
the sample mean and sample variance are shown to.be independent random 
variables. This is important in the development of statistical methods for the 
analysis of the population mean when the population variance is unknown. This 
corresponds to Student’s ¢ distribution, which is obtained as the distribution of a 
standard ‘normal variable divided by the square root of-an independent chi- 
square variable over its degrees of freedom. Another example involves Snedecor’s 
F distribution, which is obtained as the distribution of the ratio of two indepen- 
dent chi-square variables over. their respective degrees of freedom. Variables of 
the latter type are important in statistical analyses that compare the variances of 
two normally distributed populations. 


EXERCISES 


Let X denote the weight in pounds of a bag of feed, where X ~ N(101, 4). What is the 
probability that 20 bags will weigh at least a ton? 


S denotes the diameter of a shaft and B the diameter of a bearing, where S and B are 
independent with S ~ N(1, 0.0004) and B ~ N(1.01, 0.0009). 
(a) Ifa shaft and a bearing are selected at random, what is the probability that the shaft 
diameter will exceed the bearing diameter? 
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(b) Assume equal variances oj = 03 = o?, and find the value of o that will yield a 
probability of noninterference of 0.95. 


3. Let X,, X,,..., X, be a random sample of size n from a normal distribution, 

n n 

X,~ N(x, 0”), and define U = ) X; and W = y X?. 

i=1 i=1 

(a) Find a statistic that is a function of U and W and unbiased for the parameter 
6 = 2n — 507. 

(b). Find a statistic that is unbiased for o? +.y?. 

(c) Let c be a constant, and define Y, = 1 if X,< c and zero otherwise. Find a statistic 


that is a function of ¥,, ¥,,..., ¥, and also unbiased for F,(c) = o(=*) 
G 


4. Assume that X, and X, are independent normal random variables, X; ~ N(u, o”), and let 
Y, = X,+ X, and Y¥, =X, —X,.Show that ¥, and-Y, are independent and normally 


distributed. 


&. Anew component is placed in service and nine spares are available. The times to failure in 


days are independent exponential variables, T, ~ EXP(100). 
10 
(a) What is the distribution of 5° T;? 


i=1 


(b) What is the probability that successful operation can be maintained for at least 1.5 


years? Hint: Use Theorem 8.3:3 to transform to a chi-square variable. 


(c) How many spares would be needed to be 95% sure of successful operation for at 
least two years? 


6. Repeat Exercise 5 assuming T; ~ GAM(100, 1.2). 


7. Five independent tasks are to be performed, where the time in hours to complete the ith 


task is given by T, ~ GAM(100, «,), where x; =.3 + i/3. What is the probability that it will 


take less than 2600 hours to complete all five tasks? 


8. Suppose that X ~ y7(m), Y ~ y(n), and X and Y are independent. Is Y ~ X ~ y? if 
n>m? 


9. Suppose that X ~ y?(m), S = X + Y ~ x7(m + n), and X and Y are independent. Use 
MGEFs to show that S — X ~ y(n). 


70. Arandom sample of size n = 15 is drawn from EXP(6). Find c so that P[cX < 6] = 0.95, 


where X is the sample mean. 


77. Let Z ~ N(0, 1). 
(a) Find P[Z? < 3.84] using tabled values of the standard normal distribution. 
(b) Find P[Z? < 3.84] using tabled values of the chi-square distribution. 


42. ..The distance in feet by which a parachutist misses a target is D = . /X? + X32, where X, 
and X, are independent with X,; ~ N(0, 25). Find P[D < 12.25 feet]. 
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73. Consider independent random variables Z; ~ N(O, 1), i = 1,..., 16, and let Z be the 
sample mean. Find: 


(a) P[Z <3]. 
(b) P[Z, — Z, <2]. 
(c) PLZ, + Z, <2]. 


16 
(d) of ya x2 


i=1 
(c) af y G22 25 


74. If T ~ t(v), give the distribution of T?. 


75. Suppose that X;~ N(y, 0’), i=1,...,n and Z,; ~ N(0, 1), i=1)..., k, and all variables 
are independent. State the distribution of each of the following variables if it is a “named” 
distribution or otherwise state “unknown.” 


@%-% OF 
ee 
(b) X¥, + 2X, G) Zz. 
X,-X, xX 
a ae i= 
Oe «z 
(d) Z2 0 VnkX — ») 
ao | YZ? 
i=1 
= Sie k 
Ro Tae ‘ 
FRM yz, 
On, o i=1 
5} 
: Z, 
(f) 22+ 22 (n) +e 
(g) Zi - 23 (0) kZ? 
z, (k — 1) 5 (X,-X) 
h i=. 
(h) z (p) 


= +—____ 
(n — 1)o? )) (Z, - Z)? 
Peps 


76. Let X,, X,,..., X, be a random sample from a normal distribution, X,; ~ N(6, 25), and 
denote by X and S? the sample mean and sample variance. Use tables from Appendix C to 
find each of the following: 

(a) P[3<X <7]. 
(b) P[1.860 < 3(X — 6)/S]. 
(c) P[S? < 31.9375] 
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9717. Use tabled values from Appendix C to find the following: 
(a) P[7.26 < Y < 22.31] if Y ~ y7(15). 
(b) The value b such that P[Y < b] = 0.75 if Y ~ 77(23). 


(c) A if > | if Y ~ y7(6). 


1+Y° 16 
(d) P[0.87 < T < 2.65] if T ~ ¢(13). 
(e) The value b such that P[T < b] = 0.60 if T ~ (26). 
(f) The value c such that P[| T| > c] = 0.02 if T ~ 4(23). 
(g) P[2.91 < X < 5.52] if X ~ F(7, 12). 


(h) As > 025 if X ~ F(20, 8). 


78. Assume that Z, V,, and V, are independent random variables with Z ~ N(0, 1), V, ~ x7(5), 
and V, ~ 77(9). Find the following: 
(a) PLV, + Vp < 8.6]. 
(b) P[Z/./V,/5 < 2.015}. 
(©) P[Z > 0.611./V]. 
(d) PEV;,/V, < 1.450]. 


V, 
(e) The value 6 such that Ay = a | = 0.90. 


i 2 
78. if T ~ (1), then show the following: 
(a) The CDF of T is F(t) = ; + - arctan(t). 
(b) The 100 x yth percentile is ¢,(1) = tan[a(y — 1/2)]. 
20. Show that if X ~ F(2, 2b), then 
(a) PLEX > x] = (: 5 )" for all x > 0. 
(b) The 100 x yth percentile is f,(2, 2b) = b[(1 — y)~”” — 1]. 


21. Show that if F(x; ) is the CDF of X ~ POI(u), and if H(y; v) is the CDF of a chi-square 
distribution with v degrees of freedom, then F(x; uw) = 1 — H[2u; 2(x + 1)]. Hint: Use 
Theorem 3.3.2 and the fact that Y ~ y?(v) corresponds to Y ~ GAM(2, v/2). 


22. If X ~ BETA(p, q), derive E(X’). 


23. Consider a random sample from a beta distribution, X; ~ BETA(1, 2). Use the CLT 
(Theorem 7.3.2) to approximate P[X < 0.5] for n = 12. 


24. Let Y, ~ y?(n). Find the limiting distribution of (Y, — n)/,/2n as n > oo, using moment 
generating functions. 


2§. Rework Exercise 5(b) and (c) using normal approximation, and compare to the exact 
results. 


26. 


27. 
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Let X,, X2,..., X, be a random sample from a distribution whose first four moments 
exist, and let 


s2 = 3 (X,—- Dn — 1) 
isl 


Show that S? =, 6? as n> 00. Hint: Use Theorem 8.2.2 and the Chebychev inequality. 


Compare the Wilson-Hilferty approximation (Equation 8.5.2) to the exact tabled values of 
Xb.95(10) and 5 ,o5(10). 
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The previous chapters were concerned with developing the concepts of probabil- 
ity and random variable to build mathematical models of nondeterministic physi- 
cal phenomena. A certain numerical characteristic of the physical phenomenon 
may be of interest, but its value cannot be computed directly. Instead, it is pos- 
sible to observe one or more random variables, the distribution of which depends 
on the characteristic of interest. Our main objective in the next few chapters will 
be to develop methods to analyze the observed values of random variables in 
order to gain information about the unknown characteristic. 

As mentioned earlier, the process of obtaining an observed result of a physical 
phenomenon is called an experiment. Suppose that the result of the experiment is 
a random variable X, and f(x; 0) represents its pdf. It is common practice to 
consider X as a measurement value obtained from an individual chosen at 
random from a population. In this context, f(x; @) will be referred to as the 
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population pdf, and it reflects the distribution of individual measurements in the 
population. Complete specification of f(x; @) achieves the goal of identifying the 
distribution of the response of interest. 

In some cases it is possible to arrive at a specified model based on axiomatic 
assumptions or other knowledge about the population, as was the case in certain 
counting problems discussed earlier. More often, the experimenter is not able to 
specify the pdf completely, but it may be possible to assume that f(x; 0) is a 
member of some known family of distributions (such as normal, gamma, Weibull, 
or Poisson), and that @ is an unknown parameter such as the mean or variance of 
the distribution. The objective of point estimation is to assign an appropriate 
value for 6 based on observed data from the population. The observed results of 
repeated trials of an experiment can be. modeled mathematically as a random 
sample from the population pdf. In other words, it is assumed that a set of n 
independent random variables, X,, X,,..., X,, each with pdf f(x; 6), will be 
observed, resulting in a set of data x,, x2, ..., x,. Of course, it is possible to 
represent the joint pdf of the random sample.as.a product: 


I (X12, XQ yet Xns i) = I(%1; Af (x2; a) BES L(Xu3 7) (3.1.1) 


This joint pdf provides the connection between the observed data and the 
mathematical model for the population. In this chapter we will be concerned 
with ways to make use of such data in estimating the unknown value of the 
parameter 0. 

In subsequent chapters, other kinds of analyses will be developed. For 
example, the data not only can provide information about the parameter value, 
but also can provide information about more basic questions, such as what 
family of pdf’s should be considered to. begin with. This notion, which is gener- 
ally referred to as goodness-of-fit, will be considered in a.later chapter. It also is 
possible to answer certain questions about the population without assuming a 
functional form for f(x; 6). Such methods, known as nonparametric methods as 
well as other types of analyses, such as confidence intervals and tests of hypothe- 
ses about the value of 0, also will be considered later. 

In this chapter we will assume that the distribution of a population of interest 
can be represented by a member of some specified family of pdf’s, f(x; 6), indexed 
by a parameter 0. In some cases, the parameter will be vector-valued, and we will 
use boldfaced 8 to denote this. 

We will let Q, called the parameter space, denote the set of all possible values 
that the parameter @ could assume. If 6 is a vector, then Q will be a subset of a 
Euclidean space of the same dimension, and the dimension of Q will correspond 
to the number of unknown real parameters. 

In what follows, we will assume that X,, X,,..., X,, is a random sample from 
J (x; 9) and that 7(6) is some function of 6. 
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Definition 9.7.7 


A statistic, T = ¢(X,, X2,...,X,,), that is used to estimate the value of (6) is called 
an estimater of 7(@), and an observed value of the statistic, t= ¢(x,, x2, ..., x,), is 
called an estimate of 1(@). 


Of course, this includes the case of estimating the parameter value itself, if we 
let 1(6) = 0. 

Notice that we are using three different kinds of letters in our notation. The 
capital T represents the statistic that we use as an estimator, the lower case t is 
an observed value or estimate, and the script 7 represents the function that we 
apply to the random sample. 

Another fairly suggestive notation involves the use of a circumflex (also called 
a “hat”) above the parameter, 6, to distinguish between the unknown parameter 
value and its estimator. Yet another common notation involves the use of a tilde, 
8. The practice of using capital and lowercase letters to distinguish between esti- 
mators and estimates usually is not followed with notations such as 6 and 6. 

Two of the most frequently used approaches to the problem of estimation are 
given in the next section. 


SOME METHODS OF ESTIMATION 


In some cases, reasonable estimators can be found on the basis of intuition, but 
various general methods have been developed for deriving estimators. 


METHOD OF MOMENTS 


The sample mean, X, was proposed in Chapter 8 as an estimator of the popu- 
lation mean w..A more general approach, which produces estimators known as 
the method of moments estimators (MMEs), can be developed. 

Consider a population pdf, f(x; 6,,...., 6,), depending on one or more param- 
eters ,,..., 6,. In Chapter 2, the moments about the origin, 4“, were defined. 
Generally, these will depend on the parameters, say 


HSO,, ..., 9) = EX) jl, 2,..0:,,k (9.2.1) 


It is possible to define estimators of these distribution moments. 


Example 9.2.9 


— 


ee 


Example 9.2.3 


Example 9.2.2 . 
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Definition 9.2.7 


If X,, ..., X, is a random sample from f(x; O,, ..., 6,), the first k sample moments 
are given by 
Lx 
Mj= ae JRM2 hook (9.2.2) 


As noted in Chapter 2, the first moment is the mean, Hy = wu. Similarly, the first 
sample moment is the sample mean. 

Consider the simple case of one unknown parameter, say 0 = 6,. That X = M;, 
is generally a reasonable estimator of y = (0) suggests using ihe solution 8 of 
the equation M‘, = (8) as an estimator of 6. In other words, because M‘{, tends 
to be close to 1,0), under certain conditions we might expect that 6 will tend to 
be close to 0. 

More generally, the method of moments principle i is to choose as estimators of 


the parameters 0,, ..., 6, the values 6,, ..., 6, that render the population 
moments equal to the sainple moments. In other words, 6,,..., 6, are solutions of 
the equations 

My = 46,,...,6) j=1,2,...,k (9.2.3) 


Consider a random sample om a distribution with two unknown parameters, 
the mean pand the variance o?. We know from earlier considerations that B= pi 
and o? = E(X?) -— w= Ham (u)?, so that the MMEs are solutions of the equa- 
tions M, = fi and M, = 6? + (f)?, which are # = X and 
n xX? woe n (X; Bed xX 
a2 i 2 i 
= Se Xe Yee ee 
. » n ys n 
Notice that the MME of o? is closely related to the sample variance that was 
defined in Chapter 8, namely 6? = [(n — 1)/n]S?. 


Consider a random sample from a two-parameter exponential distribution, 
X;,~ EXP(1, 7). We know that the mean is w= p(y) =1+n, and if we set 


X =14+ 4, then 4 = X — 1is the MME of n. 


Consider now a random sample from an exponential distribution, X, ~ EXP(6), 
and ae we wish to estimate the probability p(6) = P(X > 1) = e~ "". Notice 
that 4,0) = 4 = 8, so the MME of @ is 6 = X. If we reparameterize the model 
with p = 70) — =e V=e"'" then uw = u(p) = —1/In p, and if we equate ¥ = 
H(p) = —1/In p, then the MME of p is p = e~/*. Thus, in this case we see that 
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p = p(6). If a class of estimators has this property, it is said to have an “invari- 
ance” property. 


Thus, to estimate 1(6), one might first solve X = (8) to obtain the MME of @ 
and then use (6), or else one might express y directly in terms of t and solve 
X = (2) for the MME of t. It is not clear that both approaches will always give 
the same result, but if 8 is a MME of @, then we also will refer to 2(6) as an MME 
of (8). In general, if the MMEs of the natural parameters 9,, ..., 9, are obtained, 
then 7,(0,, ..., ,) = 1;(6,, ..., 6,) will be used to estimate other functions of the 
natural parameters, rather than require that the moment equations be expressed 
directly in terms of the ts. 


Consider. a random. sample. from.a. gamma. distribution, X;~ GAM(Q@, x). 
Because yw, = w= KO and wy = 67 + yu? = KO? +. «76? = K(1 + x)6?, so that 


WX? 


KO=X and K(i+née?= > ms % 
i=1 
The resulting MMEs are 
, (Ki X) [en — tyns? na 
ot eel ie As ae 


METHOD OF MAXIMUM LIKELIHOOD 


We now will consider a method that quite often leads to estimators possessing 
desirable properties, particularly large sample properties. The idea is to use a 
value in the parameter space that corresponds to the largest “likelihood” for the 
observed data as an estimate of an unknown parameter. 


Suppose that a coin is biased, and it is known that the average proportion of 
heads is one of the three values .p = 0.20, 0.30, or.0.80. An experiment consists of 
tossing the coin twice and observing the number of heads. This could be modeled 
mathematically as a random sample X,, X, of size n +2 from a Bernoulli dis- 
tribution, X; ~ BIN(1, p), where the parameter space is Q = {0.20, 0.30, 0.80}. 
Notice that the MME of p, which is X, does not produce reasonable estimates in 
this example, because X = 0, 0.5, or 1 are the only possibilities, and these are not 
values in 9. 
Consider now the joint pdf of the random sample, 


SF (X41, X25-P) = pe *r(L — pyr 


for x; = 0.or 1. The values of f(x,, x2; p) are provided in Table 9.1 for the various 
pairs (x,, x2) and values of p. 
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TABLE 9.1 Joint pdf of the numbers of heads 
. for two tosses of a biased coin 


(X;, Xa) 
p (0, 0) (0, 1) (1, 0) (1, 1) 
0.20 0.64 0.16 0.16 0.04 


0.30 0.49 0.21 0.21 0.09 
0,80 0.04 0.16 0.16 0.64 


Suppose that the experiment results in the observed pair (x,, x2) = (0, 0). From 
Table 9.1, it would seem more likely that p = 0.20 rather than the other two 
values. Similarly, (x;, x2) = (0, 1) or (1, 0) would correspond to p = 0.30, and 
(x1, X2) = (1,1) would correspond to p = 0.80. Thus, the estimate that maximizes 
the “likelihood” for an observed pair (x,, x) is 


0.20: if (x1, x2) = (0, 0) 
B= 40.30 > if (x1, x2) = (0, 1), (1, 0) 
0.80 _ if (%,, x2) =(1, 0) 


More generally, for a set of discrete random variables, the joint density 
function of a random sample evaluated at a particular set of sample data, say 
J (Xz, ..., X,} 9), represents the probability that the observed set of data x, ..., x, 
will occur. For continuous random variables, f(x,, ..., x,; 9) is not a probability 
but it still reflects the relative “likelihood” that the set of data will occur, and this 
likelihood depends on the true value of the parameter. 


Definition 9.2.2 


Likelihood Function The joint density function of n random variables X,,..., X, 
evaluated at x,,..., x,, say f(x,,..., x,3 9), is referred to as a likelihood function. For 
fixed x,, ..., x, the likelihood function is a function of @ and often is denoted by 


L(6). 
If X,,..., X, represents a random sample from f(x; 6), then 


19) = f (x13 8) f£(x,5 9) 


For a given observed set of data, L(@) gives the likelihood of that set occurring 
as a function of 6. The maximum likelihood principle of estimation is to choose 
as the estimate of 0, for a given set of data, that value for which the observed set 
of data would have been most likely to occur. That is, if the likelihood of observ- 
ing a given set of observations is much higher when 6 = 6, than when 6 = 63, 
then it is reasonable to choose 6, as an estimate of 6 rather than 6. 
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Definition 9.2.3 
Maximum Likelihood Estimator Let L(@) = f(x,,..., x,3 9), 9 € Q, be the joint pdf 
of X,,..., X,. For a given set of observations, (x,,..., x,), a value 6 in Q at which 


L{@) is a maximum is called a maximum likelihood estimate (MLE) of 6. That is, 6 is 
a value of @ that satisfies 


S (Xp +005 X38) = max f(x, ..., X43 8) (9.2.5) 
EQ 
Notice that if each set of observations (x,, ..., x,) corresponds to a unique 
value 6, then this procedure defines a function, 6 = A(x,, ..., x,). This same func- 
tion, when applied to the.random sample, = ¢(X,, ..., X,), is called the 


maximum likelihood estimator, also denoted MLE. Usually, the same notation, 6, 
is used for both the ML estimate and the ML estimator. 

In most applications, L(@) represents the joint pdf of a random sample, 
although the maximum likelihood principle also applies to other cases such as 
sets of order statistics. 

If Q is an open interval, and if L(6) is differentiable and assumes a maximum 
on Q, then the MLE will be a solution of the equation (maximum likelihood 
equation) 


d 
75 (0) =0 (9.2.6) 


Strictly speaking, if one or more solutions of equation (9.2.6) exist, then it should 
be verified which, if any, maximize L(@). Note also that any value of @ that maxi- 
mizes L(8) also will maximize the log-likelihood, In L(6), so for computational 
convenience the alternate form of the maximum likelihood equation, 


d 
4 In L(0) = 0 (9.2.7) 


often will be used. 


Consider a random sample from a Poisson distribution, X; ~ POI(9). The likeli- 
hood function is 


dx 
n ~nbgi-* 

L6) = [[ f(x; &) = 
ied TI! 
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and the log-likelihood is 


In L(8) = —n6 + }x; In 8 —In (11x) 
i=1 i=1 
The maximum likelihood equation is 


d ae a 
plone t 2 oe 


which has the solution 6 = 3 ~ = It is possible to verify that this is a 
i=1.0 


maximum by use of the second derivative, 
a? una 
In L(@ — 
ag" HO) = — 2 2 
which is negative when evaluated at x, —n/x <0. 
In subsequent examples, unless otherwise indicated, }\ will denote a summa- 
tion from 1 to n. Similarly, [] will denote a product from 1 to n. 
Suppose now that we wish to estimate 
t=17(0) = PLX¥ =0] =e? 
We may reparameterize the model in terms of 1 by letting 6 = —In 1. We obtain 
<'(—|n 1)=* 


S*(x; 1) = Tl x;! 


if Z*(t) represents the likelihood function relative to 7, then 
In EX(t) =n Int + ¥ x; In (—In t)— In [] x;! 


d In L*(t) ats x od 
dt t 7 ln Tt 


and setting the derivative equal to zero gives 


A 


—-Int=x t=e%* 


In this example, it follows that ¢ = 7(6) = 1(6). 


We could have maximized L*(t) relative to t in this case directly by the chain 
rule without carrying out the carers Specifically, 


d 7 a 
76 — In L()) = an GB oS 


and if dt/d@ 4 0, then d/dz[In L*(t)] = 0 whenever d/d6[In L(6)] = 0, so that the 
maximum with respect to t occurs at 1(6). 
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It should be noted that we are using the notation t to represent both a func- 
tion of @ and a value in the range of the function. This is a slight misuse of 
standard mathematical notation, but it is a convenient practice that often is used 
in problems involving reparameterization. 

In general, if u is some one-to-one function with inverse u~+, and if t = u(9), 
then we can define L(t) = L(u~‘(z)). It follows that 7 will maximize I*(t) if 
u-*(#) = 6, or equivalently, 7 = u(8). When u is not one-to-one, there is no unique 
solution to t = u(@) for every value of t. The usual approach in this case is to 
extend the definition of the function L*(z). For example, if for every value 1, L(6) 
attains a maximum over the subset of 2 such that t = u(4), then we define [*(t) to 
be this maximum value. This generalizes the reparameterized likelihood function 
to cases where u is not one-to-one, and it follows that 7 = u(6) maximizes L*(z) 
when § maximizes L(6). (See Exercise 43.) These results are summarized in the 
following theorem. : 


Invariance Property If @ is the MLE of @ and if u(@) is a function of 6, then u(6) 
is an MLE of u(8). 


In other words, if we reparameterize by t = 1(6), then the MLE of t is ? = 7(6). 


Consider a random sample from an exponential distribution, X ;~ EXP(6). The 
likelihood function for a sample of size n is 


1 
L(@) = ie e X78... Q<x,< 00 


Thus, 


pe d —n x; 
In L(@) = —n In Senay ad and Loe at Ser 
Equating this derivative to zero gives 6 = x. 
If we wish to estimate p(0) = P(X > 1) = e~ 8, then we know from Theorem 
9.2.1 that the MLE is p(6) = e~ 1/*, 


There are cases where the MLE exists but cannot be obtained as a solution of 
the ML equation. : 


Consider a random sample from a two-parameter exponential distribution, 
X; ~ EXP(L, 9). The likelinood function is L(y) = exp [—Y (x; — n)] if all x, > 7 
and zero otherwise. If we denote the minimum of x,, ..., x, by X1,,, then we can 
write L(q) = exp [n(y — x)] if x;., > and zero otherwise. The graph of L(y) is 
shown in Figure 9.1. 


FIGURE 3.7 
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The likelihood function for a random sample from EXP(1, 7) 


L(y) 


It is clear from the figure that L(y) is maximized at 4 = x,,,, and the ML 
estimator is the first order statistic. This is an example where the MLE and the 
MME are different. (See Example 9.2.2). 


As noted earlier, the ML principle can be used in situations where the observed 
variables are not independent or identically distributed. 


The lifetime of a certain component follows an exponential distribution, EXP(6). 
Suppose that n components are randomly selected and placed on test, and the 
first r observed failure times are denoted by x,.,,, ..., X;:,- From equation (6.5.12) 
the joint density of X,.,;..-,-X,., is given by 


L(@) ioe Tins soso Xpind 8) 


i nl 7 Be |=" — aaa (ey 
= (n—n! — y! exp a a exp any ee F 


n! -| Fm + (n oes na | 


~a—nie oP 6 


Note that T= )' X,,, +(n—r)X,., represents the total survival time of the n 
i=1 


items on test until the experiment is terminated. To obtain the MLE of @ based 


on these data, we have 
T d — T 
In L(@)=const—rin@—~— In LQ=s +5 
Setting the derivative equal to zero gives 6 = T/r. ; 
If the complete sample is observed, then r =n and 6 = xX as before. 


298 


CHAPTER 9 POINT ESTIMATION 


The previous examples have involved the distributions with one unknown 
parameter. The definitions of likelihood function and maximum likelihood esti- 
mator can be applied in the case of more than one unknown parameter if 0 
represents a vector of parameters, say @=(6,, ..., 0,).. Although © could, in 
general, be almost any sort of k-dimensional set, in most examples it is a Carte- 
sian product of k intervals. When Q is of this form and if the partial derivatives of 
L(O,, ..., 9,) exist, and the MLEs do not occur on the boundary of Q, then the 
MLEs will be solutions of the simultaneous equations 

6 

06; 
for j = 1,..., k. These are called the maximum likelihood equations, and the solu- 
tions are denoted by 6,, ..., 6,. As in the one-parameter case, it generally is 
necessary to verify that the solutions of the ML equations maximize L(6,,..., 6,). 


In L(0,,..., 0) =0 (9.2.8) 


Theorem 9.2.2 Anvariance Property If 6 = (6,,..., 6,) denotes the MLE of ® = (@,, ..., 6,), then 
= (4, 


4 
} 


Example 2.2.70 
t 


the MLE of t = (r,(8), ..., 7,(0)) is t = (,, .... t,) = (t,),..., <6) fort <r<k. 


The situation here is similar to the case of a single parameter.-If t.represents.a 
one-to-one. transformation, then .a-reparameterized: likelihood -function.can: be 
defined, and the MLE of tis obtained -as-the transformation. of the MLE of @. In 
the case of a transformation that is-not. one-to-one, the likelihood function rela- 
tive to t must be extended in a manner similar to.the.single-parameter case. 

Note that the multiparameter estimators often are not the same as the individ- 
ual estimators when the other parameters are assumed to be known. This is 
illustrated by the following example. 


For a set of random variables X; ~ N(u, 07), the MLEs of u and @ = co? based 
on a random sample of size n are desired. We have 


I (x5 4, 8) = ere 
Te 


L(y, 0) = (2n6)~"? exp [S-2] 


aot 2 
In L(y, 8) = const ~ 2 In g— 2 — 


6 in L(y, 6) 23° (xi — w) 
ayn 7 20 


heey 
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and 
d In Liu, 6) _ = Uw 
g02~*t*é“‘iéi 26? 


Setting these derivatives equal to zero and solving simultaneously for the solu- 
tion values f and 6 yields the MLEs 


a ee 
n 


The ML method enjoys the invariance property in the two-parameter normal 
case. For example, if the likelihood function is maximized with respect to u and o, 


then one obtains # =X and ¢=./@, and similarly for other functions of the 
parameters. Notice that if @ = 0) is known, then from the first ML equation 
20 %i-D _, 
29 


yields 2 = X as before, but iftu = yo is assumed known, then from the second ML 
equation 


oa + 3 nS Ho) 236 
yields 
Aa yy (x; — Ho)” 
n 


Consider a random sample from a two-parameter distribution with both param- 
eters unknown, X; ~ EXP(9, y). The population pdf is 


f(x; 0, 4) = (Fee MIO op EX 


The likelihood function is 


L(8, 4) = 6) ol do “= 2] all x;y 


and the log-likelihood is 


In L(0, n) = —n In pen Xin BN 
where x,.,, is the minimum of x,,..., x,. As in Example 9.2.8, the likelihood is 
maximized with respect to y by taking # = x,,,. To maximize relative to 6, we 
may differentiate In L(6, 4) with respect to 0, and solve the resulting equation, 


d In L(O, #) ms > (x; — f) 


do 2—té«<CS =) 
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which yields 


_ The «@ percentile, x,, such that F(x,) =a is given by x, = —@ In (1 — a) +4, 
and the MLE of x, is %, = —6 In (1 — a) + # by Theorem 9.2.2. 


Let us consider ML estimation for the parameters of a gamma distribution, 
GAM(O, x), based on a random sample of size n. We have 


J c-1 
LO, «) = ——— xX; exp [—). x,/0 
0.9)= pecrogp [ITs] xr C-2 
and 
In L(@, x) = —nk In 6 —n In P(x) +(x — 1) in [] x, —¥ x,/0 
The partial derivatives are 


din L(6, x) maar ea 


ae 8 6? 


d In LQG, k) | 
OK ~ 
If we let X=([] x,)'" denote the geometric mean of the sample and let 
Y(k) = '(«)/T(x) denote the psi function, then setting the derivatives equal to 
zero gives the equations 
6 = x/k 
In (&) — ‘P(K) — In (x/x) = 0 
This provides an example where the ML equations cannot be solved in closed 
form, although a numerical solution for & can be obtained from the last equation; 
for example, by using tables of the psi function. We see that & is a function of x/X 
and is not a function of x, X, and n separately. Thus it is convenient to provide 
tables of & in terms of x/X. Perhaps the best approach to ML estimation for the 


gamma distribution is the use of the following rational approximation 
[Greenwood and Durand (1960)]: 


_ 05000876 + 0.1648852M — 0.0544274M? 

= vm 

8.898919 + 9.059950M + 0.9775373M? 
M(17.79728 + 11.968477M + M7) 


~n In 6 — nI(K)/T (x) + In TT] x; 


0< M < 0.5772 


Po 


0.5772 <M <17 


SD 


M>17 


SD 
x|- 


where M = In (x/x). 


— 
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The MLEs are not the same as the MMEsz, but the ML estimate of the mean is 
= 6% = 


Pau 


It is also possible to have solutions to the ML equations aha can be obtained 
only by numerical methods. 


Consider a random sample of size n from a Weibull distribution with both scale 
and shape parameters unknown, X; ~ WEI(6, 8). The population pdf is 


f (x5 8, B) = (B/0)(x/0)’~* exp [—(x/6)*] 
for x > 0; 6> 0, and B > 0, and the log-likelihood function is 


In L(6, B) =n In (6/0) + (B — 1) ¥ In (x,/0) — ¥ (x,/0)" 


which leads to the ML equations 
~ in L(0, f) = —nbj0 + (6/8) (x /0 =0 


5 In L(6, B) = n/B + ¥ In (x,/6) — ¥. (x/0)? In (x,/0) = 
After some algebra, the MLEs are the solutions B = 8 and 6 = 6 of the equa- 
tions 


Lwin x, In x) .2)dc- edn x, 
Shae eee 
6 = (h xf/n)*? 


The equation g(6) = 0 cannot be solved explicitly as a function of the data, but 
for a given set of data it is possible to solve for # by an iterative numerical 
method such as the Newton-Raphson method. Specifically, we can define a 
sequence f,, 8,,... such that 


p =f oe 9Bn~1) 

ss aS 9'(Bm—1) 

where B >0 is an initial value, g'(B) is the derivative of g(f), and By B as 
m= 00. 


=0 


Some large-sample properties of MLEs will be discussed in Section 9.4, and 
additional methods of estimation will be presented in Section 9.5. 
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CRITERIA FOR EVALUATING ESTIMATORS 


Example 9.3.1 


Several properties of estimators would appear to be desirable, including 
unbiasedness, 


Definition 9.3.1 
Unbiased Estimator An estimator T is said to be an unbiased estimator of (8) if 
E(T) = 2(0) (9.3.1) 


for all 0 ¢.. Otherwise, we say.that-T is a biased estimator of 7(8). 


if an unbiased estimator is used to assign a value of (8), then the correct value 
of (@) may not be achieved by any given estimate, t, but the “average” value of T 
will be 7(6). 


Consider a random sample from a distribution f(x; 8) with © = (u, 0), where yu 
and o? are the mean and variance of the population. It-was shown in Section 8.2 
that the sample mean and variance, X¥ and S*, are unbiased estimators of “and 
o”, respectively. If both w and o? are unknown, then the appropriate parameter 
space is a subset of two-dimensional Euclidean space. In particular, @ is the 
Cartesian product of the intervals (— oo, 00) and (0, 00); Q = (—a, «) x (0, 00). 
If only one parameter is unknown, then Q will consist of the corresponding one- 
dimensional set. For example, suppose the population is normal with unknown 
mean, “4, but known variance ¢? =9. The appropriate Parameter space is 
Q=(—, 00), because in general for the mean of a normal distribution, 
—O << oO. 

We may desire to estimate a percentile, say the 95th percentile, of the distribu- 
tion N(w, 9). This is an example of a function of the parameter, because t() 
= M+ 029.95 = + 4.95. It follows that T = X + 4.95 is an unbiased estimator 
of t(u), because E(T) = E(X + 4.95).= E(X) + 4.95 = +495, regardless of the 
value of yu. 


It is possible to have a reasonable estimator that is biased, and often an esti- 
mator can be adjusted to make it unbiased. 


| Example 9.3.2 
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Consider a random sample of ‘size from ‘an’ exponential distribution, 
X; ~ EXP(6), with parameter 0. Because @ is the mean of the distribution, we 
know that the MLE, X, is unbiased for 6. If we wish to estimate the reciprocal of 
the mean, 1(6) = 1/0, then by the invariance property the MLE is T, = 1/X. 
However, T, is a biased estimator of 1/6, which follows from results in Chapter 8. 
In particular, from Theorems 8.3.3 and 8.3.4, we know that 


Furthermore, it follows from equation (8.3.6) with r= —1 that E(Y~?) 
= 1/[2(¢n — 1)], and consequently E(T,) = [n/(n — 1)](1/@). Although this shows 
that T, is a biased estimator of 1/6, it also follows that an adjusted estimator of 
the form: cT,, where c=(n—1)/n is: unbiased for 1/6. This also suggests that 
while the unadjusted estimator, T,, is biased, it still might: be reasonable because 
the amount of bias, 1/[(n — 1)6], is small for large n. 

It is not always possible to adjust a biased estimator in this manner. For 
example, suppose that:it.is desired to-estimate 1/@ using only the smallest obser- 
vation, which would:‘correspond to-observing the first order-statistic, X,,,. It was 
shown in Example 7.2.2 that X,,, ~ EXP(@/n), and consequently nX,,, is 
another example of an unbiased estimator of 0. This suggests that T, = 1/(nX,.,) 
also could be used. to. estimate 1/6. The statistic 7, cannot be adjusted in the 
above manner to be unbiased for 1/@, because E(T,) does not even exist. 

The statistics T,; and T, illustrate a possible flaw in the concept of unbiasedness 
as a general principle. In particular, if § is an unbiased estimator of 6, then 2(8) 
will not necessarily be an unbiased estimator 1(6). Yet (8) may be a reasonable 


estimator of (6), such as the case when 6 = X¥ and 1(6) = 1/X. 


It often is possible to derive several different potential estimators of a param- 
eter. For example, in some cases the MLEs and the MMEs have basically differ- 
ent forms. This raises the obvious question of how to decide which estimators are 
“best” in some sense, and this question will be discussed next. 

A very general idea-is to.select the estimator that tends to be closest or “most 
concentrated” around the true value of the parameter. It might be reasonable to 
say that T; is more concentrated than T; about 7(6) if 


P[(0) —e < T, < 1(0) +e] >. P[t(0) —e < T, < 1(6) +e] (9.3.2) 


for all ¢ > 0, and that an estimator is most concentrated if it is more concentrated 
than any other estimator. 
The idea of a more concentrated estimator is illustrated in Figure 9.2, which 


shows the pdf’s of two estimators 7, and.7,. 
It is not clear how to obtain an estimator that is most concentrated, but some 
other concepts will be discussed that may partially achieve this goal. For 
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The concept of ‘‘more concentrated” 


S7(659) 


Si t8) 


7(0)—e€ 7(8) 7(6) +e 


example, if T is an unbiased estimator of 7(6), it follows from the Chebychev 
inquality that 
P[t(6) — 2 < T < 2(6) +e] > 1 — Var(T)/e” (9.3.3) 


for all-e > 0. This suggests that for unbiased estimators, one with a smaller 
variance will tend tobe more concentrated:and thus may be preferable. 


Example 9.3.3. Let us reconsider Example 9.3.2, where.we are interested only in estimating the 


mean, 0. If 6, = X and 6, = nX;,,,,, then both estimators are unbiased for 6, but 
Var(,) = 6?/n and Var(8.) = 67. Thus, for n> 1, Var(6,) < Var(6,) for all 
8 >.0, and 6, is the better estimator by this criterion. 


In some cases one estimator may have a smaller variance for some-values of 8 
and a larger variance for other values of 8. In such a case neither estimator can 
be said to be better than the other in general. In certain cases it is possible to 
show. that.a. particular unbiased estimator has the smallest possible variance 
among all.possible unbiased estimators for all values of 8. In such a case one 
could restrict attention to that particular estimator. 


UNIFORMLY MINIMUM VARIANCE UNBIASED ESTIMATORS 


Definition 9.3.2 


Let X,, X2,...., X, be.a random sample of size n from f(x; 6), An estimator T* of 
(9) is called a uniformly minimum variance unbiased estimator (UM VUE) of 1(6) if 


1. T* is unbiased for 2(8), and 
2. for any other unbiased estimator T of 1(8), Var(T*) < Var(T) for all 0c Q. 
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in some cases, lower bounds can be derived for the variance of unbiased esti- 
mators. If an unbiased estimator can be found that attains such a lower bound, 
then it follows that the estimator is a UMVUE. In the following discussion, if 
the appropriate derivatives exist and can be passed under the integral sign (or 
summation), then a lower bound for the variance of an unbiased estimator can be 
established. Among other things, this will require that the domain of the inte- 
grand must not depend on @. 

If T is an unbiased estimator of 1(6), then the Cramer-Raoe lower bound 
(CRLB), based on a random sample, is 


[x'(0)1? 
na] 30 706.0] (9.3.4) 
nE| =, In f(X; 8) 


Assuming differentiability conditions as mentioned earlier, the CRLB can be 
developed as follows. We will. assume the case of sampling from a continuous 
distribution. The discrete case is similar. 

Consider the function defined by 


Var(T) = 


é 
e{X1, sees Xn3 0) ca 30 0 SF (*4; see Xn3 0), 
which also can be written 


1 a 
= Hoey cs xs 16 TI v9 Xi 8) (9.3.5) 


2(X 1, 00-5 X_3 
If we define a random variable U = 2(X,,..., X,; 9), then 


Bu = {~~ (aunties O)F (X45 «+9 Xpj 9) dx, +++ dx, 


é 
59 JO wee) X_3 9) dx, +++ dx, 


i 
sls 


[- Lo. 8) dx, «Pax: 


iW 


~ d6 
=0 
Note also that if T = ¢(X,,..., X,) is unbiased for (6), then 


1(8) =a7)= [+ | 4, cons Xf (% 1, ---5 Xp_j O) dx, +++ dx, 
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If we differentiate with respect to 0, then 


= [of Arn. Seen 1239539) dx, 21+ ax, 


=[- [a a223 MmlKas ones Xp OS (615-022 Xp; 8) dX, +*> ax, 


= E(TU) 


It also follows, from equations (2.4.6) and (5.2.11) and the fact that E(U) = 0, that 
Var(U) = E(U?) and Cov(T, U) = E(TU). 

Because the correlation coefficient is always between +1 [see equation (5.3.2)], 
it follows that [Cov(T, U)]* < Var(T) Var(U), and consequently 
Var(T)E(U’) > [1'(0)]’, so that 


[7'(6)]? 


Var(T) > : 
Bl 55 SK teen, Xn a | 


(9.3.6) 
When X,,..., X, represent a random sample, 
F(X, 665 Xq3 8) = f(% 15 9) F(%n; 9) 


so that 


6 
aX, seen Xny 8) >= by = In F(x 6) 
i=1 00 
in which case 
= 0 0 4 

E(U*) = Var(U) = n Var 39 8 F(X; 9 | =n 36 In f(X; 6) 

which yields inequality (9.3.4). 
Note that if the proper differentiability conditions hold, as mentioned earlier, it 


can be shown that 


2 


CEL Xx; 9 fe AS Xx; 6 
99 In F(X; 8) wi ws az in F(X 9) 


Consider a random sample from an exponential distribution, X; ~ EXP(9). 
Because 


In f(x; 6) = —x/@ — In 0 
<n f(x; 6) = x/6? — 1/0 


= (x — 0/6? 
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Thus, 
x 5 in f(X: a) = EL(X — 07/6) 
a 67/64 
= 1/6? 


and the CRLB for 7(@)= 6 is 1/{n(1/6)] = 62/n. Because Var(X) = 6?/n, it 
follows that X is the UMVUE of 8. 


It is possible to obtain more information about the type of estimator, the 
variance of which can attain the CRLB, by further considering the derivation of 
inequality (9.3.6). The lower bound is attained only when the correlation coeffi- 
cient of T and U is +1. It follows from Theorem 5.3.1 that this occurs if and only 
if T and U are linearly related, say T = aU +.b with. probability 1 for some 
constants a # O and b. Thus, for T to attain the CRLB of 7(6), it must be a linear 
function of 


In f(X;5 8) 


Se 
Blo 


‘i 
u 


We take a random sample X,, ..., X, from a geometric distribution with param- 
eter 6 = p, X; ~ GEO(@), and we wish to find.a UMVUE for 7(6) = 1/0. Because 


In f(x; 0) =In 64+ —1) In (i —8) 


x—-—1 


6 1 
36 MIMD gag 


2x — 1/68 
TO | 


For the variance of an unbiased estimator T to attain the CRLB, it must have 
the form 


Pe ay (X, — 1/08 —1) +b 
i=1 


which also can be expressed as a linear function of the sample mean, say T = cX 

+d for constants c and d. Because X is unbiased for 1/6, then necessarily 
c=1 and d=0,so that T = X is the only such estimator. The variance of X is 
Var(X) = (1 — 6)/(n6?), which also can be shown to be the CRLB for this case. 


This discussion also suggests that only certain types of functions will admit an 
unbiased estimator, the variance of which can attain the CRLB. 
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If an unbiased estimator for t(0) exists, the variance of which achieves the CRLB, 
then only a linear function of 7(9) will admit an unbiased estimator, the variance 
of which achieves the corresponding CRLB. 


Thus, in the previous example, there is no unbiased estimator, the variance of 
which attains the CRLB for unbiased estimators of 8 because @ is not a linear 
function of 1/0. It cannot be concluded from this that an UMVUE of 6 does not 
exist, only that it cannot be found by the CRLB approach. In the next chapter we 
will study a method that often works when the present approach fails. 

Comparisons involving the variances of estimators often are used to decide 
which method makes more efficient use of the data. 


| Definition 9.3.3 

Efficiency The relative efficiency of an unbiased estimator T of 1(8) to another 
unbiased estimator T* of t(@) is given by 
Var(T*) 

Var(T) 


An unbiased estimator T* of 1(0) is said to be efficient if re(T, T*) <1 for all 
unbiased estimators T of t(6), and all 9-¢ &. The efficiency of an unbiased estimator 
T of t(8) is given by : 


e (T) = re(T, T*) (9.3.8) 


re(T, T*) = (9.3.7) 


if T* is an efficient estimator of 7(6). 


Notice that in this terminology an efficient estimator is just a UMVUE. 

The notion of relative efficiency can be interpreted in terms of the sample sizes 
required for two types of estimators to estimate a parameter with comparable 
accuracy. Specifically, suppose that T, and T, are unbiased estimators of (8), and 
that the variances are of the form Var(T,) = k,/n and Var(T;) = k,/n. The rela- 
tive efficiency in this case is of the form re(T;, T,) = k2/k,. If it is desired to 
choose sample sizes, say n, and n, to achieve the same variance by either 
method, then k,/n, = k,/n,, which implies n/n, = re(T;, T,). In other words, if 
T, is less efficient than T,, one could choose a larger sample size, by a factor of 
k,/k,, to achieve equal variances. 

Some authors define the efficiency of T to be the ratio CRLB/Var(T), which 
allows the possibility that a UMVUE could exist but not be efficient by this 
definition. However, it does follow that if CRLB/Var(T) = 1, then T is an effi- 
cient estimator by Definition 9.3.3. At this point, use of the CRLB is the only 
convenient means we have to verify that an estimator is efficient. 
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Example 9.3.6 Recall in Example 9.3.2 that the estimator T = (n — 1)/(nX) is unbiased for 
1(@) = 1/8. In this case 7(@)= —1/6? and the CRLB is [—1/67]?/{n(1/6?)] 
= 1/(n67). It was found in Example 9.3.4 that the variance of X attains the CRLB 
of unbiased estimators of 6. Because 1(0) = 1/0 is not a linear function of 6, there 
is no unbiased estimator of 1/@ whose variance equals 1/(n6?). In terms of the 
random variable Y = 2nX/0, we can express T as T = [2(n — 1)/0]Y~?. From 
this and equation (8.3.6) we can show that Var(T) = 1/{(n — 2)6?]. Even though 
Var(T) does not attain the CRLB, it is quite close to it for large n. It often is 
possible to obtain an unbiased estimator, the variance of which is close to the 
CRLB even though it does not achieve it exactly, so the CRLB can be useful in 
evaluating a proposed estimator, whether an UMVUE exists or not. Actually, we 
will be able to show in the next chapter that the estimator T is an UMVUE for 
1/6. This means that T is an example of an efficient estimator that cannot be | 
obtained by the CRLB method. 


Example 9.3.7 Recall in Example 9.3.3 that we had unbiased estimators 6, = X and 6,=nX ion 
of 6. It was later found that 6, is a UMVUE. Thus, 6, is an efficient estimator of 
6 and the efficiency of 8, is 
Gin 1 
e(63) = re(6, 6) = pron 


and thus 6, is a very poor estimator of 6 because its efficiency is small for large n. 


A slightly biased estimator that is highly concentrated about the parameter of 

- interest may be preferable to an unbiased estimator that is less concentrated. 

Thus, it is desirable to have more general criteria that allow for both biased and 
unbiased estimators to be compared. 


Definition 9.3.4 
If T is an estimator of 1(6), then the bias is given by 


b(T) = E(T) — 1(8) (9.3.9) 


and the mean squared error (MSE) of T is given by 
MSE(T) = E[T — 7(6)]? (9.3.10) 


Theorem §.3.2 If T is an estimator of 1(6), then 
MSKE(T) = Var(T) + [b(T)]? (9.3.11) 
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Proof 
MSE(T) = E[T — 1(6)]? 
= E[T — E(T) + E(T) — 1(6)]? 
= E[T — E(T)]? + 2[E(T) — 1(6)] 
x [E(T) — E(T)] + (A(T) — 7)’ 
=.Var(T).+.[b(T)]? 


The. MSE is a.reasonable criterion that.considers both the variance and the 
bias of an estimator, and it agrees. with the. variance criterion if attention is 
restricted to unbiased estimators. It provides a useful means for comparing two 
or-more. estimators, .but it.is.not possible to obtain an estimator that has uni- 
formly minimum MSE for all 6.¢ Q.and all possible estimators. 


Consider.a-family.of pdf’s f(x;.0) where the parameter space Q contains.at:least 
two values. If no restrictions are placed on the type of estimators under consider- 
ation, then constant estimators, 6, = c for c ¢ 2, cannot be excluded. Such esti- 
mators clearly are not desirable from a practical point of view, because they do 
not even depend on the sample, yet.each such estimator has a small MSE for 
values of @ near c. In particular, MSE(6,) = (c — 6)", which is zero if @ = c. This 
means that for a uniformly minimum MSE estimator, 6, necessarily MSE(6) = 0 
for all 9 € Q. This would mean that 6 is constant, say 6 = c* (with probability 1). 
Now, if @¢ © and 6 #c*, then MSE(8) = (c* — 6)? > 0, in which case 6 does 
not have uniformly minimum MSE. 


Tf the class of estimators under consideration can be restricted to a smaller 
class, then it may be possible to find a uniformly minimum MSE estimator. For 
example, restriction to unbiased estimators eliminates estimators of the constant 
type, because 6, = ¢ is not an unbiased estimator of 0. 


Consider a random sample from a two-parameter exponential distribution with 
known scale parameter, say 9 = 1, and unknown location parameter y. In other 
words, X; ~ EXP(1, n). 

We wish to compare the MME and the MLE, #, and #2, respectively. 


Specifically, let #7, =X —1 and #,=X,.,. It is easy to show that X —1n 
~ GAM(L/n, n) and X,., — 1 ~ EXP(1/n), and it follows that 


E(q,) = E(X — 1) = E(X)-1=14+n-1=9 
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and 
E(f2) = E(X 4:5) = E(Xi., Hh a n) = E(X 1:5 er n) ar y= I/n or 4 


Thus, 7, is unbiased and fj, is biased with bias term b(f2) = 1/n. Their MSEs are 
MSE(f,) = Var(X — 1) = Var(X) = 1/n 
and 
MSE(#2) = Var(f2) + (1/n)? = Var(X,,,) + (1/n)? 
= Var(X1.,— 1) + (1/n)? = (1/n)? + (1/n)? = 2/n? 


Thus, for.» >.2.the biased estimator has a.much- smaller MSE than does the 
unbiased estimator. 


It also is possible to adjust , to be unbiased, say #3; = X,.,—1/n, so 
that E(f,) = E(X,,,) —in=n+1/n—in=n and MSE(f3) = Var(X,.,) 
= Var(X,,, — 1) = 1/n?. Thus, for n >.1, #, has the smallest MSE of the three. 

It is interesting to note that in Example 9.3.3, when the sample was assumed to 
be from.an exponential distribution with unknown scale parameter 6, the MLE 
of 6, which is X, was much superior to the one based on X;,,. In the present 
example, where the distribution is exponential with a known scale but unknown 
location parameter, the result is just the reverse. 


LARGE-SAMPLE PROPERTIES 


- We have discussed properties of estimators such as unbiasedness and uniformly 


minimum variance. These are defined for any fixed sample size n, and are exam- 

ples of “small-sample” properties. It also is useful to consider asymptotic or 

“large-sample” properties of a particular type of estimator. An estimator may 

have undesirable properties for small n, but still be a reasonable estimator in 

certain applications if it has good asymptotic properties as the sample size - 
increases. It also is possible quite often to evaluate the asymptotic properties of 

an estimator when small-sample properties are difficult to determine. 


Definition 9.4.7 


Simple Consistency Let {T,} be a sequence of estimators of 7(8). These estimators 
are said to be consistent estimators of 7(6) if for every ¢ > 0, 


lim P[| T, — 1(6)| <6] =1 (9.4.1) 


nc 


for every 8 € Q. 
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In the terminology of Chapter 7, T,, converges stochastically to 1(6), T, Z 1(8) as 
n-» oo. Sometimes this also is referred to as simple consistency. One interpreta- 
tion of consistency is that for larger sample sizes the estimator tends to be more 
concentrated about 7(@), and by making n sufficiently large 7, can be made as 
concentrated as desired. 

Another slightly stronger type of consistency is based on the MSE. 


Definition 9.4.2 


MSE Consistency If {7,} is a sequence of estimators of 1(6), then they are called 
mean squared error consistent if 


lim E[T,.— 7(0)]? = 0 (9.4.2) 


no 


for every 6.6. 


Another desirable property is asymptotic unbiasedness. 


Definition 9.4.3 
Asymptotic Unbiased A sequence {T,} is said to be asymptotically unbiased for 7() 
if 

lim E(T,) = 7(@) (9.4.3) 


na 


for all 6 c Q. 


It can be shown that a MSE consistent sequence also is asymptotically 
unbiased and simply consistent. 


A sequence {T,} of estimators of 1(6) is mean squared error consistent if and only 
if it is asymptotically unbiased and lim Var(T,) = 0. 


n>@ 


Proof 
This follows immediately from Theorem 9.3.2, because 
MSE(T,) = Var(T,) + [E(T,) — 2()]? 


Because both terms on the right are nonnegative, MSE(T;,) ~ 0 implies both 
Var(T,) > 0 and E(T;) — t(@). The converse is obvious. 
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In Example 9.3.2 we considered the reciprocal of the sample mean, which we now 
denote by T, = 1/X, as an estimator of 7(6) = 1/0. As noted earlier, ¥Y = 2nX/0 
~77(2n). It follows from equation (8.3.6) that E(T,) = [n/(n ~ 1)](1/@) and 
Var(T,) = [n/(n — 1)]?/[™ — 2)67], so that E(T,)> 1/0 and Var(T,)>0 as 
n-» co. Thus, even though T;, is not unbiased, it is asymptotically unbiased and 
MSE consistent for t(6) = 1/0. 


As mentioned earlier, MSE consistency is a stronger property than simple con- 
sistency. 


If a sequence {T,} is mean squared error consistent, it also is simply consistent. 


Proof 


This follows from the Markov inequality (2.4.11), with X = T, — 1(0), r = 2, and 
c =.6, so that 


P[| T, — 1(8)| < ¢] > 1 —- MSE(T))/e? 


which approaches 1 as n — oo: 


Let X,,..., X, be a random sample from a distribution with finite mean yu and 
variance o?. It was shown in Chapter 7 that the sample mean, X,,, converges 
stochastically to w, and if the fourth moment, y/,, is finite, then the sample 
variance, Sj, converges stochastically to o?. Actually, because X, and S? are 


‘unbiased and their respective variances approach zero as.n— oo, it follows that 


they are both simply and MSE consistent. 

If the distribution is exponential, X, ~ EXP(6), then it follows that X, is MSE 
consistent for @, but the estimator 8, =nX,,, is not even simply consistent, 
because 1X ,.,, ~ EXP(). 

If the distribution is the two-parameter exponential distribution, EXP(1, 7), as 
in Example 9.3.10, then the unbiased estimator 4; =.X,., —1/n is MSE consis- 
tent, because MSE(f3) = 1/n? > 0. However, fj, = c(X,., — 1/n) is not MSE con- 
sistent i fixed c#1 and y #0, because MSE(,) = = ¢7/n? + (¢ — 1)*4? > 
(c — 1)?7? #0. The choice of c that minimized MSE when 4 =1 was 
c= il +n?), which has limit 1. In general, if c-1 as n—oo, then 
MSE(f,) > 0, and #4 is MSE consistent, and also asymptotically unbiased for yn. 


If {T,} is simply consistent for 1(9) and if g(¢) is continuous at each value of 7(8), 
then g(T,,) is simply consistent for-g(t()). 
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Proof 
This follows immediately from Theorem 7.7.2 with Y, = T, and c = 1(6). 
A special application of this theorem is that if t(@) is a continuous function of 9 


and 6, is simply consistent for 0, then 2(6,) is simply consistent for (6). 
It also is possible to formulate an asymptotic version of efficiency. 


Definition 9.4.4 


Asymptotic Efficiency Let {T,} and {T*} be two. asymptotically unbiased 
sequences of estimators for 7(@). The asymptotic relative efficiency of 7, relative to 
T= is given by 


 Mar(T*) 
T,, TX) =1 7 
are(T,, Tz) aes Var(T) 


(9.4.4) 


The sequence {T*} is said to be asymptotically efficient if are(T,,, T*) <1 for all 
other asymptotically unbiased sequences {T,}, and all 6 ¢ Q, The asymptotic effi- 
ciency of an asymptotically unbiased sequence {T,} is given by 


ae(T,) = are(T,, T*) (8.4.5) 
if {7 *} is asymptotically efficient. 


The CRLB is not always attainable for fixed n, but it often is attainable asymp- 
totically, in which case it can be quite useful in determining asymptotic efficiency. 


Recall that in Example 9.4.1, which involved sampling from EXP(6), the sequence 
T, = 1/X was shown to be asymptotically unbiased for 1/9. The variance is 
Var(T,) = [n/(n — 1)]?/[(n — 2)67?] and the CRLB is [—1/67]?/[n(1/6’)] 
= 1/[n6?]. Because 


fim CREB _ 1/[n6?] . 
new Var(T) ,.4 [aa — DEP — 207] 


it follows that T, is asymptotically efficient for estimating 1/0. 


Consider again Example 9.3.9, where the population pdf is a two-parameter 
exponential, X; ~~ EXP(1, 7). Because the range of X, depends on y, the CRLB 
cannot be used here. The estimators 4#, = X,,, and #3; = X,., —1/n are both 
asymptotically unbiased, and both have the same variance, Var(f,) 
= Var(f3) = 1/n?, and thus 


A sn A /n? 
aris, tz) = lin oar) 
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We will see later that #3 is a UMVUE for , and thus #, also is asymptotically 
efficient for estimating y. 

Another idea might be to compare MSEs rather than variance. The present 
example suggests that this generally will not provide the same criterion as com- 
paring variances, because 


MSE(f3) 1/n? 1 


This results from the fact that the bias squared and the variance are both the 
same power of 1/n, namely. 1/n?. Thus, some limitation on the bias must be con- 
sidered or the. asymptotic relative.efficiency may be misleading. 

In Example 9.3.9, another unbiased estimator, 7, =X — 1, also was con- 
sidered, and Var(#,) = 1/n. Thus 

. If? 
ace fg) = lim = 0 
so 4, is not as desirable as 4. 

It shouldbe noted that.:this. is an unusual example; in. most cases the variance 
of an estimator is of the form c/n, whereas in the case of #3 it is 1/n?, which is a 
higher power of 1/n. 


An estimator with variance of order 1/n? usually is referred to as a supereffi- 
cient estimator. 

It often is difficult to obtain an exact expression for the variance of a proposed 
estimator. Another approach, which sometimes is used, restricts attention only to 
estimators that are asymptotically normal and replaces the exact variances with 


asymptotic variances in the definition of asymptotic relative efficiency. Specifi- 


cally, if {7,} and {T*} are asymptotically normal with asymptotic mean 1(6) and 
respective asymptotic variances k(8)/n and k*(@)/n, then the alternative definition 
is 


a, 
are(T,, T*) =— (9.4.6) 


Of course, this approach is appropriate only for comparing asymptotically 
normal estimators, but it is somewhat simpler to use in many cases. An estimator 
T* is at least as good as T,, if k*(6) < k(6) for all @ € , and T* is asymptotically 
efficient, in this sense, if this inequality holds for all asymptotically normal esti- 
mators 7J,,. Such an estimator is often referred to as best asymptotically normal 
(BAN). 


A random sample of size n is drawn from an exponential distribution, X,; ~ 
EXP(), and it is desired to estimate the distribution median, 1(6) = (In 2)@. One 


316 


CHAPTER 9 POINT ESTIMATION 


possible estimator is the sample median, T, = X,., with r/n— 1/2 as n>. 
Under the conditions of Theorem 7.5.1 with p = 1/2, T, is asymptotically normal 
with asymptotic mean +(0) and asymptotic variance 6?/n, Another possibility is 
based on the sample mean, T* = (In 2)X,. It follows from the central limit 


theorem that Z, = ,/n(X, — 6/0 > N(O, 1) as n-> 00, and consequently T* is 
asymptotically normal with asymptotic mean and variance, respectively 7(@) and 
(In 2)70?/n. Thus, k(0) = 6? and k*(6) = (In 2)76?, and by definition (9.4.6) the 
asymptotic relative efficiency is (In 2)?07/0? = (In 2)? = 0.48, and T* is the better 
estimator, Actually, it is possible to show, by comparison with the CRLB, that 
T* is efficient, but it still might be useful in some applications to know that for 
large samples, a method based on the sample median is 48% as efficient as one 
based on the sample mean. 


ASYMPTOTIC PROPERTIES. OF. MLEs 


Under certain circumstances, it can be shown that the MLEs have very desirable 
properties. Specifically, if certain regularity conditions are satisfied, then the solu- 
tions, 6,,, of the maximum likelihood equations have the following properties: 

1. 6, exists and is unique, 
. 6, is a consistent estimator of 0, 
6, is asymptotically normal with asymptotic mean @ and variance 


tw bo 


a 2 
une| & In-f(X; a | , -and 


4. 6, is asymptotically efficient. 


Of course, for an MLE to result from solving the ML equation (9.2.6), it is 
necessary that the partial derivative of In f(x; 6) with respect to 0 exists, and also 
that the set A = {x: f(x; 6) > 0} does not depend on @. Additional conditions 
involving the derivatives of In f(x; 0) and f(x; 0) also are required, but we will 
not discuss them here. Different sets of regularity conditions are discussed by 
Wasan (1970, p. 158) and Bickel and Doksum (1977, p. 150). 

Notice that the asymptotic efficiency of 9, follows from the fact that the asymp- 
totic variance is the same as the CRLB for unbiased estimators of 0. Thus, for 
large n, approximately 


6, ~ N(0, CRLB) 


It also follows from Theorem. 7.7.6 that if (@) is a function with nonzero deriv- 
ative, then ¢, = 1(6,) also is asymptotically normal with asymptotic mean 1(6) and 
variance [1’ (6)]2CRLB. Notice also that the asymptotic variance of 7, is the 
CRLB for variances of unbiased estimators t = 1(6), so that 7, also is asymp- 
totically efficient. 
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| Example 9.4.6 Recall in Example 9.2.7 that the MLE of the mean @ of an exponential distribu- 
tion is the sample mean, 6, = X,. It is possible to infer the same asymptotic 

properties either from the preceding discussion or from the Central Limit 
Theorem. In particular, 6, is asymptotically normal with asymptotic mean 6 and 
variance 6?/n. It also was shown in Example 9.3.4 that CRLB = 62/n. We also 
know the exact distribution of 6, in this case, because 


2nb, 
~ x°(2n) 


which is consistent with the asymptotic normal result 


Jina? ~N(, 1) 


because a properly standardized chi-square variable has a standard normal limit- 
ing distribution. 
Suppose that now we are interested in estimating 
R= R(t; 0) = P(X > t) = exp (-1/8) 


An approximation for the variance of R = exp (—t/6) is given by the asymptotic 
variance 


“ é 2 
Var(R) = E R(t; a| (62/n) 


= [exp (—¢/0)(t/67)]?(67/n) 
= [exp (—1/8)(t/@)]"/n 
= [R(in R)}?/n 

and thus for large n, approximately 


fice tae ia? R ~ N(R, R(In R)?/n) 


Example 9.4.7 Consider a random sample from a Pareto distribution, X;~ PAR(1, x) where x 
is unknown. Because 


f(x; K) =K(L + x)7*7} x>0 
it follows that : 

In L(x) =nink« —(k +1) ¥ In(l +x) 
and the ML equation is 


d . 
a, In Lk) = n/x — Yo In (1 + x) = 0 
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which yields the MLE 
k =n/¥ In (1 + x) 
To find the CRLB, note that 


In f(x; x) =In« —(« + 1) In (i +x) 


2m F(x; «) = 1/e — In (1 + x) 


and thus, 

CRLB = 1/nE[1/« — In (1 + X)]? 
To evaluate this last expression, it is convenient to consider the transformation 

Y =In (1 + X) ~ EXP(1/x) 
so that 

E[in (1 + X)] = I/x 
E{i/« — In (1 + X)}* = Var[in (1 + X)] = 1/x? 
Var(z) = CRLB = x?/n 

and approximately 


& ~ N(k, K?/n) 


Consider a random sample. from the two-parameter.exponential distribution, 
X, ~ EXP(1, 7). We recall from Example 9.2.8 that the MLE, # = X,.,, cannot be 
obtained as a solution to the ML equation (9.2.7), because In L(y) is not differen- 
tiable over the whole parameter space. Of course, the difficulty results from the 
fact that the set A = {x: f(x; 1) > 0} = [n, 0) depends on the parameter y. Thus, 
we have an example where the asymptotic normality of the MLE is not expected 
to hold. 

As a matter of fact, we know from the results of Chapter 7 that the first order 
statistic, X,,,, is not asymptotically normal; rather, for a suitable choice of 
norming constants, the corresponding limiting distribution is an extreme-value 
type for minimums. 


Asymptotic properties such as those discussed earlier in the section exist for 
MLEs in the multiparameter. case, but they cannot be expressed conveniently 
without matrix notation. Consequently, we will not consider them here. 
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9.5 


BAYES AND MINIMAX ESTIMATORS 


When an estimate differs from the true value of the parameter being estimated, 
one may consider the loss involved to be a function of this difference. If it is 
assumed that the loss increases as the square of the difference, then the MSE 
criterion simply considers the average squared error loss associated with the esti- 
mator. Clearly the MSE criterion can be generalized to other types of loss func- 
tions besides squared error. 


Definition.9.5.7 


Less Function If Tis an estimator of 7(@), then a loss function is any real-valued 
function, L(t; 8), such that 


L(t; 0) >0. for every.t (9.5.1) 
and 
L(t; 8) =0 when t = 7(8) (9.5.2) 


Definition 9.5.2 
Risk Function. ~The risk function is defined to be the expected loss, 


R,(9) = E[L(T; 6)] (9.5.3) 


Thus, if a parameter or a function of a parameter is. being estimated, one may 
choose an appropriate loss function depending on the problem, and then try to 
find an estimator, the average loss or risk function of which is small for all pos- 
sible values of the parameter. If the loss function is taken to be squared error loss, 
then the risk becomes the MSE.as considered previously. Another reasonable loss 
function is absolute error, which gives the risk function R,(@) = E| T — 2(6)|. 

As for the MSE, it usually will not.be possible. to. determine for other risk 
functions an estimator that has smaller risk than all other estimators uniformly 
for all 6. When comparing two specific estimators, it is possible that one may 
have a smaller risk than the other for all 6. 
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Definition 9.5.3 
Admissible Estimator An estimator T, is a better estimator than T, if and only if 
Rr) < Ry (8) for all @eQ 
and 
R;,(9) < R08) for at least one 0 


An estimator T is admissible if and only if there is no better estimator. 


Thus, if one estimator has uniformly smaller risk than another, we will retain 
the first estimator for consideration and eliminate the latter as not admissible. 
Typically, some estimators will have smallest risk for some values of @ but not for 
others. As mentioned earlier, one possible. approach to select a best estimator is 
to restrict the class of estimators. This was discussed in connection with the class 
of unbiased estimators with MSE risk. There is no guarantee that this approach 
will work in every case. 


In Example 9.3.9 we found an unbiased estimator, #3; = X,., —1/n, which 
appeared to. be reasonable for estimating the location parameter ny. We now con- © 
sider a class of estimators of the form 4, = cf, for some constant c > 0. Such 
estimators will be biased except in the case c = 1, and the MSE risk would be 
MSE(j4) = Var(cfj) + [b(cij3)]? = c?/n? + (c — 1)?n?. 

Let us attempt to find a member.of this class of estimators with minimum 
MSE. This corresponds to choosing ¢c = n?4?/(1 + n?n?). Unfortunately, this 
depends upon the unknown parameter that we are trying to estimate. However, 
this suggests the possibility of choosing c to obtain an estimator that will have 
smaller risk, at least over a portion of the parameter space. For example, if it is . 
suspected that 7 is somewhere close to 1, then the appropriate constant would be 
c =n?/(1 +n’). For this choice of c, MSE(jj4) < MSE(#;) if and only if 7 satisfies 
this inequality c?/n? + (c — 1)*n? < 1/n?, which corresponds to 4? <2 + 1/n?. 
For a sample of size n= 3, c=0.9, MSE(#,) = 0.09 + 0.01n?, and MSE(f,) 
< MSE(f3) if —145 <4 < 1.45 and MSE(f,4) > MSE(f3) otherwise. Thus, the 
MSE is not uniformly smaller for either estimator, and one could not choose 
between them solely by comparing MSEs. The comparison between the MSEs for 
these estimators is provided in Figure 9.3. 

If it could be determined a priori that — 1.45 < n < 1.45, then 4, would be 
preferable, but otherwise it might be best to use the unbiased estimator #3. 


FIGURE 3.3 
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Comparison of MSEs for two different estimators of the threshold parameter 


— 1.45 1.45 


Another criterion that sometimes is used to. select-an estimator from the class 
of admissible estimators is the minimax criterion. 


Definition 9.5.4 
Minimax Estimater An estimator T, is a minimax estimator if 


max R,,(6) < max R,(8) (9.5.4) 
a e 


for every estimator T. 


In other words, T, is an estimator that minimizes the maximum risk, or 


max R;,(6) = min max R,(@) (9.5.5) 
8 T e 


Of course, this assumes that the risk function attains a maximum value for some 
@ and that such maximum values attain a minimum for some T. In a more 
general treatment of the topic, the maximum and minimum could be replaced 
with the more general concepts of least upper bound and greatest lower bound, 
respectively. 

The minimax principle is a conservative approach, because it attempts to 
protect against the worst risk that can occur. 


Consider the class of estimators of the form #4 = c(X,,, — 1/n) discussed. in 
Example 9.3.10, and MSE risk. Recall that MSE(#,) = c?/n? + (c — 1)?n?, which 
depends on 4 except when c = 1. This last case corresponds to the unbiased 
estimate #,. If 0<c <1, then neither #3 nor #4 has uniformly smaller MSE for 
all y, so we might consider using the minimax principle. Because 


max MSE(f3) = 1/n? 


” 
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and 


max MSE(f,4) = max [c?/n? + (c — 1)?n7] = wo 
n q 


the unbiased estimator, #3, is the minimax estimator within this class of estima- 
tors. It is not clear, at this point, whether #3 is minimax for any larger class of 
estimators. 


One possible flaw of the minimax principle is illustrated by the graph of two 
possible risk functions in Figure 9.4. The minimax principle would choose T, 
over T,, yet T, is much better than T, for most values of 0. 

In Example 9.3.10 it was suggested that an experimenter might have some 
prior knowledge about where, at least. approximately, the parameter may be 
located. More generally, one might: want to use an estimator that has small risk 
for values of @ that are “most likely” to occur in a given experiment. This can be 
modeled mathematically by treating 6 as a random variable, say 6 ~ p(8), where 
p(@) is a function that has the usual properties (2.3.4) and (2.3.5) of a pdf in the 
variable @.'A reasonable approach then would be to.compute the “average” or 
expected risk of an estimator, averaged over values of 6 with respect to the pdf 
p(6), and choose an estimator with smallest average risk. 


Definition 9.5.5 


Bayes Risk For a random sample from f(x; 6), the Bayes risk of an estimator T 
relative to a risk function R,(@) and pdf p(6) is the average risk with respect to p(@), 


Ay = EgRz(6)] = | R,(6)p(6) 40 (9.5.6) 
fe] 


If an estimator has the smallest Bayes risk, then it is referred to as a Bayes 
estimator. 


Comparison of risk functions for two estimators 


R;(8) 
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Definition 9.5.6 


Bayes Estimator For a random sample from /(x; 0), the Bayes estimator T* rela- 
tive to the risk function R;(@) and pdf p(@) is the estimator with minimum expected 
risk, 

ElRr(8)] < Ef R7()] (9.5.7) 


for every estimator T. 


In ‘some kinds of problems it is reasonable to:assume that the parameter varies 
for different cases, and it may be proper to treat 9 as a random variable. In other 
cases, p(@) may reflect prior information or belief.as to what the true value of the 
parameter may be. In either case, introduction of the pdf p(6), which usually is 
called a prior density for the parameter 6, constitutes an additional assumption 
that. may be.helpful or. harmful. depending on its correctness. In any event, 
averaging the risk relative to a pdf p(@) isa procedure that provides a possible 
way to discriminate between two estimators when neither of their risk functions 
is uniformly smaller than the other for all @. 

A whole class of estimators can be produced by considering different pdf’s p(@). 
It_is useful:to havea class of estimators in a-problem, although if there-is.some 
physical reason-to justify. choosing a particular p(6), then the estimator associated 
with that p(@) would presumably be the best one to.use.in that problem. There 
are different philosophies involved with choosing prior densities p(@), but we will 
not be too concerned with how the p(@) is chosen in this work. In some cases @ 
may indeed act like a random variable, and p(@) would reflect this fact. Alterna- 
tively, p(@) may represent a degree of belief concerning the value of @ arrived at 


- from previous sampling information, or by other means. in any event, potentially 


Example 9.5.3 


useful estimators can be developed through this structure. The subject of choos- 
ing a prior pdf is discussed in books by DeGroot (1970) and Zellner (1971). 


Consider again the estimators #3 =X ,,—1/n and 44 =0.9(X,., —1/n) of 
Example 9.3.10. With squared error loss we found that #3 is better by the 
minimax principle, but 4, is better if it is known that 7? < 2 + 1/n?, because it 
has smaller MSE for y in this subset of . We now assume a standard normal 
prior density, 7 ~ N(O, 1), and compare the Bayes risk. It follows that 
E,[R,(0)] = E,(1/n?) = 1/n? and E,[R,,,(n)] = E,[0.81/n? — 0.0177] = 0.81/n? 
+ 0.01. According to this criterion, #3 is better ifn > 5.and A, is better ifn < 4. 


A few results now are considered that are useful in determining a Bayes estima- 
tor. Note that in this framework the density function f(x; 6) is interpreted as a 
conditional density function f(x | @). 
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Definition 9.5.7 


Posterior Distribution . The conditional density. of 6 given the sample observations 
x = (X,,...; x,) is called the posterior density or posterior pdf, and is given by 


I (X15 +++) Xn1 Dp(8) Geni 
[00 vee Xq149)p(8) dO 


Soix(9) = 


The Bayes estimator. is the estimator that minimizes.the average risk over 
6, Eg Rx(0)]. However, 


E,LR7(4)] = Eg{Ex AL LT; 8)]} 
= Ey{Eg)x{L(T; 6)]} (9.5.9) 


and an estimator T that minimizes E,).[L(T; 6)] for each x also minimizes the 
average over X. Thus the Bayes estimator may be obtained by minimizing the 
expected loss relative to the posterior distribution. 


Theorem.9.5.1. 1f-X,,..:, X, denotes a random sample from-f(x| 6), then.the Bayes estimator is 
the estimator that minimizes the expected loss relative to the posterior distribu- 
tion of 6| x, 


Eq LL(T; 8)] 


For certain types of loss functions, expressions for the Bayes estimator can be 
determined more explicitly in terms of the posterior distribution. 


Theorem 9.5.2. The Bayes estimator, T, of c(9) under the squared error loss function, 
L(T; 6) = [T — 2(6)]? (9.5.10) 


is the conditional mean of 1(@) relative to the posterior distribution, 


T = Eg)x[t(9)] = | 1(8) fojx(0) 40 (9.5.11) 


Proof 


See Exercise 41. 


Example 9.5.4 Consider a random sample of size n from a Bernoulli distribution, 


F(x|8) = (1 —O)'"* x =0,1 


o=———— 
Example 9.5.5 
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and let © ~ UNIF(0, 1). Equation (9.5.8) yields the posterior density 


= *i(1 ple 6)" -Lxi 


fe\.(9) = I 0<@<1 
{ 681 — 6)"-E* ao 


0 


It is convenient to express this using the notation of the beta distribution. As 
noted in Chapter 8, a random variable Y has a beta distribution with parameters 
a and b, denoted Y~BETA(a, b), if it has pdf of the form 
f(y; a, b) = y*"*(1 — y)?~4/B(a, b) when 0 < y < 1, and zero otherwise, with con- 
stant B(a, b) = T(a)I(b)/T (a + b). In this notation recall that the mean of the dis- 
tribution is E(Y)=a/(a+b). It also is possible to express the posterior 
distribution in terms of this notation. Specifically, 


B= *i(1 ons gyn esi 
BO. x; + Iln-¥ x; +1) 


In other words, O|x ~ BETA()’ x,+ 1, n—} x; + 1). Consequenily, E,,(@) 
=(0% 4+ DIO 44+ D4+0— 3 x, 4+) =O) x: 4+ D/(n +2). With squared 
error loss, we have by Theorem 9.5.2 that the Bayes estimator of @ is T 
= (0) X; + Ufa + 2). 


SoA) = 0<6@<1 (9.5.12) 


Suppose that X; ~ POI(6), and we are interested in a Bayes estimator of 6 
assuming squared error. loss.. We. choose to. consider the class of gamma prior 


densities, 9 ~ GAM, x), 


Qe 1g- 48 


1 
p(@) = BT (x) 


where £ and x are known arbitrary constants. The posterior distribution is given 


by 
fax = k — nb 9x xige = lg —|/| er nOgz OKT lg - 6/8 2) 
TT] BT) TT DBT) 
That is, 


6|x ~ GAM[(n + 1/8)-1, ¥ x, + «] (9.5.13) 


The Bayes estimator of @ is therefore 


X,+ 


A prior density with large 8 and small x makes this estimator close to the MLE, 
=X. 
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The risk in this case is 
R,(0) = E[T — 6]? = Var(T) + [E(T) — 0]? 
__n Var(X) | K -o| 
(n+ 1/f)* [n+1/B 
os MOA LK 6/B)? 


atade tase yee (n + 1/B)? 


Some authors define the Bayes estimator to be the mean of the posterior dis- 
tribution, which (according to Theorem 9.5.2) results from squared error loss. 
However, it is desirable in some applications to use other loss functions. 


Theorem 9.5.3 The Bayes estimator, 6, of @ under absolute error loss, 
L(6; 6) =| —6| (9.5.14) 


is the median of the posterior distribution fy) x(6). 


Proof 


See Exercise 42. 


The Bayes estimator structure sometimes is helpful in finding a minimax esti- 
mator. 


Theorem 9.5.4 \{ T* is a Bayes estimator with constant risk, R,.(@) = c, then T* is a minimax 
estimator. 
Proof 


We have max R;.(#) = max c = c = R,.(9), but because R,.(8) is constant over 6, 
8 8 


Ry(6) = Eo[Rr(8)] < E[Rz(8)] 


for every T because T* is the Bayes estimator. Now the average of a variable is 
not larger than the maximum value of a variable, so 


EL Rr(@)] < max R;(8) 
e 


and 
max Ry.(8@) < max R,(6) 
8 @ 


which shows that T* is a minimax estimator. ] 
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It follows that if an appropriate prior pdf, p(9), can be found that will yield a 
Bayes estimator. with constant risk, then the Bayes. estimator also will be the 
minimax estimator. 


4 
Recall the prior and posterior distributions of Example 9.5.4, but now consider a 
“weighted” squared error loss, 
(t— 6)? 
(1 — 6) 
which gives more weight to the values of 6 that are closer to zero or one. Note 
that 


Lit; 0) = 


(9.5.15) 


on (nO ope 

FeilLG; O)] = |, a1 — 6) BO x; + 1n—-Y x, +1) 

eta 6 ee pyr zxivt 
BY, Xj5 ye) 


do 


= C(x) [ — 6)? dé 
0 


with C(x) = BO x,, n— Y. x)/BO) x; + 1, n— ¥; x; + 1), which means that the 


expression Eg),[Z(t; ©)] is minimized when the latter integral is minimized. 
Notice that this integral corresponds to the conditional expectation of ordinary 
squared error loss (t— 6)? relative to the posterior distribution BETA 
Ox, n >. x). By Theorem 9.5.2, this integral is minimized when t is the mean 
of BETA (S'x,, n—> x). This mean is t=}, x/O, x+n—-) x) = x. It 
follows that t*(x) = x, and the Bayes estimator is T* = X. Furthermore, the risk 
is 


oi1—6)  @4-—6) n 


which is constant with respect to 6. By Theorem 9:5.4, ¥ is a minimax estimator 
in this example. 


SUMMARY 


Our purpose in this chapter was to provide general methods for estimating 
unknown parameters and to present criteria for evaluating the properties of esti- 
mators. The two methods receiving the most attention were the method of 
moments and the method of maximum likelihood. The MLEs were found to have 
desirable asymptotic properties under certain conditions. For example, the 
MLEs in certain cases are asymptotically efficient and asymptotically normally 
distributed. 

In general, it is desirable to have the distribution of the estimator highly con- 
centrated about the true value of the parameter being estimated. This concentra- 
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tion may be reflected by an appropriate loss function, but most attention is 
centered on squared error loss and MSE risk. If the estimator is unbiased, then 
the MSE simply becomes the variance of the estimator. Within the class of 
unbiased estimators there may exist an estimator with uniformly minimum 
variance for all possible values of the parameter. This estimator is referred to as 
the UMVUE. At this point a direct method for finding a UMVUE has not been 
provided; however, if an estimator satisfies the CRLB, then we know it is a 
UMVUE. The concepts of sufficiency and completeness discussed in the next 
chapter will provide a more systematic approach for attempting to find a 
UMVUE. 

In lieu of finding an estimator that has uniformly minimum variance over the 
parameter, @, we considered the principle of minimizing the maximum variance 
(risk) over @ (minimax estimator), and minimizing the average variance (or more 
generally risk) over 6, giving the Bayes estimators. Bayes estimation requires 
specifying an additional prior density p(@). Information was provided on how to 
compute a Bayes estimator, but very little on how to find a minimax estimator. 


EXERCISES 


Find method of moments estimators (MMEs) of @ based on a random sample X,,..., X, 
from.each of the following pdfs: 

(a) f(x; 6) =0x°=1;0 < x < 1, zero otherwise: 0 < 0. 

(b) f(x; 6) = (6 + 1)x-°~?; 1 < x, zero otherwise; 0 < 0. 

(c) f(x; 0) = 6?xe7™; 0 < x, zero otherwise; 0 < @. 


Find the MMEs based on a random sample of size n from each of the following 
distributions (see Appendix B): 

{a) X;~ NBQG, p). 

(b) X; ~ GAM(2, x). 

(c) X; ~ WEI(6, 1/2). 

(d) X,; ~ DE(6, y) with both 6 and 4 unknown. 

(ce) X; ~ EV(6, n) with both @ and 7 unknown. 

(f) X;~ PAR(@, «) with both 6 and k unknown. 


Find maximum likelihood estimators (MLEs) for @ based on a random sample of size n for 
each of the pdf’s in Exercise 1. 


Find the MLEs based on a random sample X,,..., X,, from each of the following 
distributions: 
- (a) X; ~ BIN(L, p). 

(b) X; ~ GEO(p). 

(c) X; ~ NBG, p). 

(d) X; ~ N(O, 6). 


70. 


77. 
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(e) X;~ GAM(O, 2). 
(f) X; ~ DE(6, 0). 
(g) X, ~ WEI(6, 1/2). 
(h) X, ~ PAR(L, ¥). 


Find the MLE for @ based on a random sample of size n from a distribution with pdf 


20?x73 O<x 
fea = 19 x<0;0<86 


Find the MLEs based on a random sample X,,..., X,, from each of the following pdf’s: 
(a) f(x3 81, 02) = 182 — 64); 0, < x < 45, zero otherwise; 
(b) f(x; 6, 7) = On°x~°-1; 4 < x zero otherwise; 0 < 0,0<14 < ow. 


Let X,,..., X, be a random sample froma geometric distribution, X ~ GEO(p). Find the 
MLEs of the following quantities: 

(a) E(X) = L/p. 

(b) Var(X) = (1 — p)/p”. 

(c) P[X > k] =(1 — p)‘ for arbitrary k = 1, 2,... 

Hint: Use the invariance property of MLEs. 


Based on a random sample of size n from a normal distribution, X ~ N(u, 07), find the 
MLEs of the following: 

(a) P[X > c] for arbitrary.c. 

(b). The 95th percentile of X. 


Suppose that x,., and x,., are the smallest and largest observed values of a random 
sample of size n from a distribution with pdf f(x; 6); 0 < 6. 
(a) If f(x; 0) = 1 for @ -0.5 <x <6 4+ 0.5, zero otherwise, then show that any value 6 
such that x,,, — 0.5 <6 < x,,, + 0.5 is an ML estimate of 0. 
(b) If f(x; 6) = 1/0 for 6 <x < 26, zero otherwise, then show that 6 = 0.5x,,, is an ML 
estimate of 6. 


Consider a random sample of size n from a double exponential distribution, 
X,~ DE(6, 7). 
(a) Find the MLE of 4 when 6 = 1. Hint: Show first that ifx,, ...,x, are observed 
values, then the sum )°|x; — a| is minimized when a is the sample median. 
i=1 


(b) Find the MLEs when both @ and x are unknown. 


Consider a random sample of size n from a Pareto distribution, X; ~ PAR(6, 2). 
(a) Find the ML equation (9.2.7). 
(b) From the data of Example 4.6.2, compute the ML estimate, 6, to three decimal 
places. Note: The ML equation cannot be solved explicitly for 6, but it can be 
solved numerically, by an iterative method, or by trial and error. 
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72. Let ¥,,..., ¥, bea random sample from a log normal distribution, Y ~ LOGN(u, o*). 
(a) Find the MLEs of yw and o?. 
(b) Find the MLE of E(Y). 
Hint: Recall that Y ~ LOGN(y, o”) if In (Y) ~ N(, 07), or, equivalently, Y = exp (X) 
where X ~ N(u, 0”). 


73. Consider independent random samples X,,..., X,, and Y,,..., Y,, from normal 


distributions with a common mean , but possibly different variances, o? and o3, so that 
X;~ Niu, oj) and Y, ~ N(u, 03). Find the MLEs of y, 03, and o?. 


74... Let X be the number of independent trials of some component until it fails, where | — pis 
the probability of failure on each trial. We record the exact number of trials, Y =X, if 
X <r; otherwise we record Y =r + 1, where r is a fixed positive integer. 


(a) Show that the discrete pdf of Y is 


pe (Lae pp are pend ney F 
roin={U paved 


(b) Let ¥,,..., ¥, be a random sample from f(y; p). Find the MLE of p. 
Hint: f(y; p) = c(y; p)p’~*, where c(y; p) = 1 — pif y=1,...,r and c(r + 1; p) = 1. It follows 


that [] c(y;; p) = (1 — p)” where m is the number of observed y, that are less than r + 1. 
i=1 


75. Let X ~ BIN(n, p) and p = X/n. 
(a) Find a constant c so that E[cp(1 — p)] = p(1 — p). 
(b) Find an unbiased estimator of Var(X). 


(c) Consider a random sample of size N from BIN(n, p). Find unbiased estimators of D 
and Var(X) based on the random sample. 


76. (Truncated Poisson.) Let X.~ POI(y), and suppose we cannot observe X = 0, so the 
observed random variable, Y, has discrete pdf 


ety 
S054 = fy\(l—e4) 
0 otherwise 


yl 2: 


We desire to estimate PLX > 0] = 1 — e“. Show that an unbiased (but unreasonable) 
estimator of | — e~* is given by u(Y) where u(y) = Oif y is odd, and u(y) = 2 if y is even. 
Hint: Consider the power series expansion of (e" + e~“)/2. 


77. Let X,,..., X, be a random sample from a uniform distribution, X; ~ UNIF(0 — 1, 
8+ 1). , 
(a) Show that the sample mean, X, is an unbiased estimator of 0. 
(b) Show that the “midrange,” (X,,, + X,,.,)/2, is an unbiased estimator of 0. 


78. Suppose that X is continuous and its pdf, f(x; y), is symmetric about yu. That is, 
ftte;p) = f(u-—c;p) for allc> 0. 
(a) Show that for a random sample of size n, where n is odd (n = 2k — 1), the sample 
median, X,,,, is an unbiased estimator of 


19. 


20. 


27. 


22. 


Z3. 


24, 


25, 


26. 


EXERCISES 33] 


(b) Show that Z = X,,, —yand W = y — X,,,, have the same distribution, and thus 
that the midrange, (X,,, + X,,,)/2, is unbiased for py. 


Consider a random sample of size n from a uniform distribution, X,; ~ UNIF(—@, 6); 
6 > 0. Find a constant c so that c(X,,., —.X1.,) is an unbiased estimator of 0. 


Let S be the sample standard deviation, based on a random sample of size n from a 
distribution with pdf f(x; , 0?) with mean y and variance o7. 
(a) Show that E(S) < o, where equality holds if and only if f(x; 4, 0?) is degenerate at y, 
P[X = yp] = 1. Hint: Consider Var(S). 
(b) If X; ~ N(& a”), find a constant c.such that cS is an unbiased estimator of o. Hint: 
Use the fact that (n — 1)S?/o?_~ y?(n — 1) and S = (S?)"/?. 
(c) Relative to (b), find a function of X and S that is unbiased for the 95th percentile of 
X ~ N(w, 0”). 


Consider a random sample of size n from a Bernoulli distribution, X; ~ BIN(1, p). 
(a) Find the CRLB for the variances of unbiased estimators of p. 
(b) Find the CRLB for the variances of unbiased estimators of p({1 — p). 
(c) Find a UMVUE of p. 


Consider a random sample of size n from.a normal distribution, X; ~ N(, 9). 
(a) Find the CRLB for variances of unbiased estimators of y. 
(b) Is the MLE, # = X,a UMVUE of p? 
(c) Is the MLE of the 95th percentile a UMVUE? 


Let X,,..., X, be a random sample from a normal distribution, N(0, 9). 
(a) Is the MLE, 6, an unbiased estimator of 8? 
(b) Is 6a UMVUE of 6? 


Let X ~ POI(y), and let 6 = PLX = 0] =e™*. 
(a) Is 6 = e~* an unbiased estimator of 6? 
(b) Show that § = u(X) is an unbiased estimator of 6, where u(0) = 1 and u(x) = Oif 
5 of Ea Aa 
(c) Compare the MSEs of 6 and 6 for estimating = e~" when w= 1 and p= 2. 


Consider the estimator T, = 1/X of Example 9.3.2. Compare the MSEs of T, and eT, for 
estimating 1/6, where c = (n — 1)/n. 


Consider a random sample of size n from.a distribution with pdf f(x; @) = 1/0 if 
0 <x <6, and zero otherwise; 0 < @. 


(a). Find the MLE 86. 

(b) Find the MME 8. 

(c) Is @ unbiased? 

(d) Is 6 unbiased? 

(e) Compare the MSEs of 6 and 6. 
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28. 
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Consider a random sample of size n = 2 from a normal distribution, X; ~ N(@, 1), where 
Q = {6|0 < 6 < 1}. Define estimators as follows: 6, = (1/2)X, + (1/2)X,, 6, = (1/4)X, 
+ (3/4)X2, 63 = (2/3)X,, and 6, = (2/3)6,. Consider squared error loss L(t; @) = (t — 0)”. 
(a) Compare the risk functions for these estimators. 
(b) Compare the estimators by the minimax principle. 
(c) Find the Bayes risk of the estimators, using 9 ~ UNIF(O, 1). 
(d) Find the Bayes risk of the estimators, using @ ~ BETA(2, 1). 
Let X,,..., X,, be a random sample from EXP(6), and define 6, = X and 
6, = nX/(n + 1). 
(a). Find the variances of 6, and 6,. 
(b) Find the MSEs of 6, and 6,. 
(c) Compare the variances of 8, and 6, for n =.2. 
(d) Compare the MSEs of 6, and 6, for n = 2. 
@ (e). Find the Bayes risk of 6, using.6 ~ EXP(2). 


Consider.a random sample of size n from a Bernoulli distribution, X; ~ BIN(1, p). For a 
uniform prior density, p ~ UNIF(0, 1), and squared error loss, find the following: 


(a) Bayes estimator of p. 
(b) Bayes estimator of p(1.—-p). 
(c) Bayes risk for the.estimator.in (a). 


Let X ~ POI(w), and consider the loss function L(é, 4) = (2 — )?/u. Assume a gamma 
prior density, 4 ~ GAM(Q@, x), where @ and x are known. 


(a) Find the Bayes estimator of y. 
(b) Show that 7 = X is the minimax estimator. 


Let 8 and 6 be the MLE and MME, respectively, for 6 in Exercise 26. 
(a) Show that 6 is MSE consistent. 
(b) Show that 6 is MSE consistent. 


Show that the MLE of @ in Exercise 5 is simply consistent. 


Consider a random sample of size n from a Poisson distribution, X; ~ POI(y). 
(a) Find the CRLB for the variances of unbiased estimators of y. 
(b) Find the CRLB for the variances of unbiased estimators of 6 = e~*. 
(c) Find a UMVUE of x. 
d) Find the MLE 6 of 0. 
e) Is 6 an unbiased estimator of 0? 
f) Is 6 asymptotically unbiased? 
g) Show that 6 = [(n — 1)/n]®*‘is an unbiased estimator of 6. 
(h) Find Var(@) and compare to the CRLB of (b). 
Hint: Note that Y = )\ X, ~ POI(np), and that E(6) and Var(6) are related to the MGF 
of ¥. 
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Consider a random sample of size n from a distribution with discrete pdf 
St (x; p) = p(l — p)*; x = 0, 1,..., zero otherwise. 
(a) Find the MLE of p. 
(b) Find the MLE of 6 = (1 — p)/p. 
(c) Find the CRLB for variances of unbiased estimators of 0. 
(d) Is the MLE of 6a UMVUE? 
(e) Is the MLE of 6 MSE consistent? 
(f) Find the asymptotic distribution of the MLE of 6. 
(g) Let 6 = nX/(n + 1). Find risk functions of both 6 and X using the loss function 
L(t; 6) = (t — 0)7(6? + 6). 


Find the asymptotic distribution of the MLE of p in Exercise 4(a). 
Find the asymptotic distribution of the MLE of @ in Exercise 4(d). 


Let X,,..., X, be a random sample with an odd sample size (n = 2k — 1). 
(a) If X; ~ DE(1, m), find the asymptotic relative efficiency, as defined by equation 
(9.4.6), of 9, = X,, relative to fj, = Xin: 
(b) If X; ~ NG, 1), find the asymptotic relative efficiency, as defined by equation (9.4.6), 
of Zt, = X;,,, relative to #, = X,. 
An estimator @ is said to be median unbiased if P[6 < 0] = P[@ > 6]. Consider a random 
sample of size n from an exponential distribution, X, ~ EXP(@). 
(a) Find a median unbiased estimator of @ that has the form 6 = cX. 
(b) Find the relative efficiency of 6 compared to X. 


(c) Compare the MSEs of 6 and ¥ whenn = 5. 


Suppose that 6,, i = 1,..., n are independent unbiased estimators of @ with Var(6,) = o?. 


B n 
Consider a combined estimator 6, = )'a,6, where ) a, = 1. 


i=l tsk 


(a) Show that 6, is unbiased. 
(b) It can be shown that Var(6,) is minimized by letting a, = (1/o?) | ¥. (1/o?). Verify this 
(ist 
for the case n = 2. 


Let X be a random variable with CDF F(x). 
(a) Show that E[(X — c)?] is minimized by the value c = E(X). 
(b) Assuming that X is continuous, show that E[| X — c|] is minimized if c is the 
median, that is, the value such that F(c) = 1/2. 


Prove Theorem 9.5.2. Hint: Use Exercise 40(a) applied to the posterior distribution for 
fixed x. 


Prove Theorem 9.5.3. Hint: Use Exercise 40(b). 


Consider the functions L(@), I*(t), and u(6) in the discussion preceding Theorem 9.2.1. 
Show that t = u(6) maximizes I*(t) if @ maximizes L(6). H int: Note that I*(t) < L(8) for all 
t in the range of the function u(@), and L(6) = L*(7) if? = u(6). 
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Let X,, X>,..., X, be a random sample from an exponential distribution with mean 1/6, 
J (x| 0) = @ exp (—6@x) for x > 0, and assume that the prior density of 6 also is exponential 
with mean 1/8 where f is known. 

(a) Show that the posterior distribution is 6|x ~ GAM[(B + Y x) tjn+1). 

(b) Using squared error loss, find the Bayes estimator of 6. 

(c) Using squared error loss, find the Bayes estimator of u = 1/8. 

(d) Using absolute error loss, find the Bayes estimator of 6. Use chi-square notation to 

express the solution. 
(e) Using absolute error loss, find the Bayes estimator of u = 1/8. 


SUFFICIENCY AND 
COMPLETENESS 


10.1 


INTRODUCTION 


Chapter 9 presented methods for deriving point estimators based on a random 
sample to estimate unknown parameters of the population distribution. In some 
cases, it is possible to show, in a certain sense, that a particular statistic or set of 
statistics contains all of the “information” in the sample about the parameters. It 
then would be reasonable to restrict attention to such statistics when estimating 
or otherwise making inferences about the parameters. 

More generally, the idea of sufficiency involves the reduction of a data set to a 
more concise set of statistics with no loss of information about the unknown 
parameter. Roughly, a statistic S will be considered a “sufficient” statistic for a 
parameter 6 if the conditional distribution.of any other statistic T given the value 
of S does not involve 8. In other words, once the value of a sufficient statistic is 
known, the observed value of any other statistic does not contain any further 
information about the parameter. 
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A coin is tossed n times, and the outcome is recorded for each toss. As usual, this 
process could be modeled in terms of a random sample X,, ..., X,, from a Ber- 
noulli distribution. Suppose that it is not known whether the coin is fair, and we 
wish to estimate @ = P(head). It would seem that the total number of heads, 
S = )' X;, should provide as much information about the value @ as the actual 
outcomes. To check this out, consider 


$4, 0-5 X38) = EL — OE* x, = 0, 1 
We also know that S ~ BIN(®, 9), so that 


fils; 0) = (“joa —6yrs 5=0,1,...,0 


If}: x; =, then the events [X, = x,,..., X, = X,,5S =s] and 


[X, =%1,..., X, = x,] are equivalent, and 
_ PLX, =X, ..., Xn =n, SHS] 
tx, secs Xqhs%d> eee] Xn) ow PLS = s] 
i I(%1; eae Xn 6) 
fs(s; 


Oe et 


us -s 
(")e (1 — 8) 


oe 


2 


If }° x; 4s, then the conditional pdf is zero. In either case, it does not involve 8. 
Furthermore, let T = ¢(X,, ..., X,) be any other statistic, and define the set 
C, = {(X1, +++ Xp) HX, -»-5 Xp) = t}. The conditional pdf of T given S = s is 


fris(t) = P[T = t|S = 5] 
= yy Fro. Xp isa» snot) 


which also does not involve 6. 


It will be desirable. to have a more. precise definition of sufficiency for more 
than one parameter, and a set of jointly sufficient statistics. 


10.2 SUFFICIENT STATISTICS 337 


10.2 


SUFFICIENT STATISTICS 


/As in the previous chapter, a set of data x,, ..., x, will be modeled mathemati- 
cally as observed values: of a set of random variables X,, ..., X,. For conve- 
nience, we will use vector notation, X= (X,,..., X,) and x =(x,, ..., x,), to 
refer to the observed random variables and their possible values. We also will 
allow the possibility of a vector-valued parameter @ and vector-valued statistics § 
and 7. 


Definition 70.2.7 


Jointly Sufficient Statistics Let X= (X,,...,-X,) have joint pdf f(x, 9), and let 
S =(S,,..., 5,) be a k-dimensional statistic. Then S,, ..., S, is a set of jointly suffi- 
cient statistics for @ if for any other vector of statistics, 7, the conditional pdf of T 
given S =, denoted by /;,,(¢), does not depend on @. In the one-dimensional case, 
we simply say that S is a sufficient statistic for 0. 


Again, the idea is that if S is observed, then additional information about 9 
cannot be obtained from 7 if the conditional distribution of 7 given § = s is free 
of 6. We usually will assume that X,, ..., X, is a random sample from a popu- 
lation pdf f(x; 8), and for convenience we often will refer to the vector 
X=(X,,..., X,) as the random sample. However, in general, X could represent 
some other vector of observed random variables, such as a censored sample or 
some other set of order statistics. 

The primary purpose is to reduce the sample to the smallest set of sufficient 
statistics, referred to as a “minimal set” of sufficient statistics. If k unknown 
parameters are present in the model, then quite often there will exist a set of k 
sufficient statistics. In some cases, the number of sufficient statistics will exceed 
the number of parameters, and indeed in some cases no reduction in the number 
of statistics is possible. The whole sample is itself a set of sufficient statistics, but 
when we refer to sufficient statistics we ordinarily will be thinking of some 
smaller set of sufficient statistics. 


Definition 70.2.2 


A set of statistics is called a minimal sufficient set if the members of the set are 


jointly sufficient for the parameters and if they are a function of every other set of 
jointly sufficient statistics. 
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For example, the order statistics will be shown to be jointly sufficient. In a 
sense, this does represent a reduction of the sample, although the number of 
statistics in this case is not reduced. In some cases, the order statistics may be a 
minimal sufficient set, but of course we hope to reduce the sample to a small 
number of jointly sufficient statistics, 

Clearly one cannot actually consider all possible statistics, 7, in attempting to 
use Definition 10.2.1 to verify that S is a sufficient statistic. However, because 7 
may be written as a function of the sample, X =(X,, ..., X,), one possible 
approach would be to show that fx,,(x) is free of ®. Actually, this approach was 
used in Example 10.1.1, where X was a random sample from a Bernoulli distribu- 
tion. Essentially the same derivation could be used in the more general situation 
where X is discrete and S and 8 are vector-valued. Suppose that S = (S,, ..., S,) 
where S; = 9{X,,..., X,) for j = 1,..., k, and denote by a(x, ..., x,) the vector- 
valued function whose jth coordinate is a,(x,, ..., x,). In a manner analogous to 
Example 10.1.1, the conditional pdf of X =(X,, ..., X,) given S=s can be 
written as 


a ogni) fax x) as 
Fajs(¥1, ---) Xn) = Shcrwice (10.2.1) 


This would not be a standard situation for continuous random variables, 
because we have an n-dimensional vector of random variables with the distribu- 
tion of probability restricted to an n — k dimensional subspace. Consequently, 
care must be taken with regard to the meaning of an expression such as (10.2.1) 
in the continuous case. 

In general, we can say that S,, ..., S, are jointly sufficient for @ if equation 
(10.2.1) is free of 6. 

Some authors avoid any concern with technical difficulties by directly defining 
S;,.--, S, to be jointly sufficient for @ if f(x1,..., x,; 8)/fe(s; 9) is free of ®. In any 
event, equation (10.2.1) will be used here without resorting to a more careful 
mathematical development. 


Consider a random sample from an exponential distribution, X; ~ EXP(6). It 
follows that 


Jn 9 ) = exp (= 21) x; >0 


which suggests checking the statistic S=}’X;. We also know that 
S.~.GAM(6, n), so that 


st te~sié s>0 


fs(s; 8) = Tn 


Theorem 70.2.7 
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Ifs = >) x;, then 
fO- %  _TO) 
fs(s; 9) sind 


which is free of 6, and thus by equation (10.2.1) S is sufficient for 0. 


A slightly simpler criterion also can be derived. In particular, if S,,..., 5, are 
jointly sufficient for 6, then 


f(%1, ceey May 8) = ffs; ®)fxis(% nee | Xa) ; 
= gs; O)h(x,, oa Xn) qd 0.2.2) 


That is, the joint pdf of the sample can be factored into a function of s and 6 
times a function of x = (x,,..., x,) that does not involve 8. Conversely, suppose 
that f(x,, ..., x,3 8) = g(s; O)h(x,, ...., x,), where it is assumed that for fixed s, 
h(x,,..., X,) does not depend on 8. Note that this means that if the joint pdf of 
X,,..., X,, is zero over some region of the x,’s, then it must be possible to identify 
this region in terms of s and @, and in terms of x and s without otherwise involv- 
ing the x with the @. If this is not possible, then the joint pdf really is not com- 
pletely specified in the form stated. Basically, then, if equation (10.2.2) holds for 
some functions g and h, then the marginal pdf of S-must bein theform 


f(s; 8) = 9(s; 8)c(s) 


because for fixed ‘s integrating or summing over the remaining variables cannot 
bring 9 into the function. Thus 


f(*1, ee Xn3 Q) = fs(s; O)A(x,, ene} x,)/c(s) 
and 


f(%1, aie: Xn 0) 
fs(s; 8) 
which is independent of 6. This provides the outline of the proof of the following 
theorem. 


= h(x,,..., X,)/e(S) 


Factorization Criterion If X,,..., x have joint pdf f(x,, ..., X,; 8), and if S= 
(S;,..., S,), then S,,..., S, are jointly sufficient for 8 if and only if 


S(%1, +++) Xn3 9) = G8; Dh(X,, ..., Xq) (10.2.3) 


where g(s; 9) does not depend on x,,..., x,, except through s, and h(x,, ..., x,) 
does not involve 8. @ 
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Consider the random sample of Example 10.1.1, where X; ~ BIN(1, 6). In that 
example, we conjectured that S = }) X,; would be sufficient, and then verified it 
directly by deriving the conditional pdf of X given S = s. The procedure is some- 
what simpler if we use the factorization criterion. In particular, we have 


SX, -.0, Xj OY = OE *(1 = 6) 


= 6(1 — 6)"~5 
= G(s; h(x, --+5 Xn) 
where s = )" x; and, in this case, we define h(x,,..., x,) = 1 ifall x; = 0 or 1, and 


zero otherwise. It should be noted that the sample proportion, 6 = S/n, also is 
sufficient for @. In general, if a statistic S is sufficient for 0, then any one-to-one 
function of S also is sufficient for 0. 


It is important.to specify completely the functions involved in the factorization 
criterion, including the identification of regions of zero probability. The following 
example shows that care must be exercised in this matter. 


Consider a random sample from-a uniform distribution, X;~ UNIF(0, 8), where 
8 is unknown. The joint pdf of X,,..., X,, is 


1 
I Oa, 5 %n3 = Be 0.<x;,<6 i=l,..n 


and zero otherwise. It is easier to specify this pdf in terms of the minimum, x,.,,, 
and maximum, x,.,,, Of x;,..., X,- In particular, 


1 
S (X45 0-65 Ky} 0) = a 0 <x1, Xnin <9 


which means that 
S (1, eee | Xn a) = G(Xnsns A)A(x,, a | X,,) 


where g(s; 6) = 1/0” if s < 6 and zero otherwise, and h(x;,..., x,) = 1 if 0 < x;,, 
and zero otherwise. It follows from the factorization criterion that the largest 
order statistic, S = X,,.,,, is sufficient for 0. 


_This type of problem is made more clear by using “indicator function” nota- 
tion, which allows the conditions on the limits of the variables to be incorporated 
directly into the functional form of the pdf. 
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Definition 10.2.3 
If A is a set, then the indicator function of A, denoted by J, is defined as 


1 ifxeA 


ei 10 ifx¢ A 


In the previous example, we let A = (0, @), then 
F(x; 8) = (1/®)I 0, (0) 
so that 


SX) 003 Xp O) = (1/8") [| Loo, aX;) 
i=1 


n 
Because |] J(o,9(x) =1 if and only if 0<x,,, and 0O<~x,,, <9, equation 
i=1 


(10.2.3) is ‘satisfied with s=x,,,, g(s;@)= (/@ io, o(s) and h(x,,..., X,) 
= Too, co)\\%4:n): 


[ 
| Example 10.2.4 Consider a random sample from a normal distribution, X;~ N(u, o”), where 
both 2 and o? are unknown. It follows that 


i 1 
I(%1, seey Xx Ms a’) = (2n07)"!? exp | IG? 2 (x; fs “| 


Because )\ (x; — )? = ), x? — 2u Y x, + ny’, it follows that equation (10.2.3) 
holds with s; = )° x;, 52 = >, x7, 


1 1 
G(S1; 823 07) = (Ona XP | - Foi (s, — 2us, + mn] 


and h(x,, ..., X,) = 1. Thus, by the factorization criterion, S,; =} X, and S, 
=). X} are jointly sufficient for ® = (u, 0”). Notice also that the MLEs, f: = X 
= S,/n and 6? =) (X; — X)?/n=S,/n — (S,/n)’, correspond to a one-to-one 
transformation of S, and S,, so that f and 6? also are jointly sufficient for y 
and o?. 


In the next section, the general connection between MLEs and sufficient sta- 
tistics will be established. 


When a minimal set of sufficient statistics exists, we might expect the number 
of sufficient statistics to be equal to the number of unknown parameters. In the 
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following example, two statistics are required to obtain sufficiency for a single 
parameter. 


Consider a random sample from a uniform distribution, X;~ UNIF(@, 6 + 1). 
Notice that the length of the interval is one unit, but the endpoints are assumed 
to be unknown. The pdf of X is the indicator function of the interval, f(x; 0) 
= Io, 9+1)(x), so the joint pdf of X,,..., X, is 


I (4, 065 Xp5 = [16+ 1%) 
7 i=1 
This function assumes the value 1 if and only if @< x; and x,;< 6+ 1 for all x,, 
so that 


FX a205 Hes Bl = To, o%) [LL an, 041,082) 
i isi 


i= 


= Lo, co Nand (en, e+ 1)(Xnen) 


which shows, by the factorization criterion, that the smallest and largest order 
statistics S, = X,,, and S, = X,,.,, are jointly sufficient for @. Actually, it can be 
shown that S, and S$, are minimal sufficient. 

Methods for verifying whether a set of statistics is minimal sufficient are. dis- 
cussed by Wasan (1970), but we will not elaborate on them here. 


FURTHER PROPERTIES OF SUFFICIENT STATISTICS 


Theorem 10.3.1 


It is possible to relate sufficiency to several of the concepts that were discussed in 
earlier chapters. 


If S,,..., S, are jointly sufficient for @ and if 6 is a unique maximum likelihood 
estimator of @, then @ is a function of S = (S,,..., S;,). 
Proof 


By the factorization criterion, 
L(8) = f(X1; «++. %n3 8) = gs; BAX, ..., Xp) 


which means that a value that maximizes the likelihood function must depend on 
s, say § = As). If the MLE is unique, this defines a function of s. 


ee 
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Actually, the result can be stated more generally: If there exist jointly sufficient 
statistics, and if there exists an MLE, then there exists an MLE that is a function 
of the sufficient statistics. 

It also follows that if the MLEs, @,, ..., 6, are unique and jointly sufficient, 
then they are a minimal sufficient set, because the factorization criterion applies 
for every set of jointly sufficient statistics. 

The following simple example shows that it is possible to have a sufficient 
statistic S and an MLE that is not a function of S. 


Suppose that X is discrete with pdf f(x; 6) and Q = {0, 1}, where, with the use of 
indicator functions, 


1 1-@ @ i-—@ 6 
f(x; = 4 Tu, a(X) mn (+ + A) + ce + $M) 


Tf ox) = 304, (x) + 21 2)(x) + 41;3)(x), then $ = o(X) is sufficient for 6, which can 
be seen from the factorization criterion with 
i-@ @ 
A(x) = Iu, 2,3,4(%) and g(s; 6) = 7 oe cA Ds 
Furthermore, more than one MLE exists. For example, the functions @ (x) = 

Tis, 30) and 7,(x) =I, 3, a(x) both produce MLEs, 6; = ¢,(x) and 6, = ¢,(x), 
because the corresponding estimates maximize f(x; 0) for each fixed x. Clearly, 4, 
is not a function of S because (1) = o(4) = 3, but ¢,(1)=1 while ¢,(4) =90. 
However, 6, = ¢(s) where ¢(s) = I,3, 4(s). 


This shows that some care must be taken in stating the relationship between 
sufficient statistics and MLEs. If the MLE is unique, however, then the situation 
is rather straightforward. 


Consider a random sample of size n from a Bernoulli distribution X; 
~ BIN(1, p). We know that S =)’ X; is sufficient for p, and that S ~ BIN(a, p). 
Thus, we may determine the MLE of p directly from the pdf of S, giving 6 = S/n 
as before. 


Theorem 10.3.2 If S is sufficient for 0, then any Bayes estimator will bea function of S. 


Proof 


Because the function h(x,, ..., x,} in the factorization criterion does not depend 
on 6, it can be eliminated in equation (9.5.8), and the posterior density fo (8) can 
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be replaced by 
G(s; 8)p(8) 
g{s; )p(@) dé 


So) {9) = 


As mentioned earlier, the order statistics are jointly sufficient. 


Theorem 10.3.3 If X,,..., X, is a random sample from a continuous distribution with pdf f(x; 8), 
then the order statistics form a jointly sufficient set for.@. 


Proof 


For fixed x;.,,..-, Xq:n, and associated x,,..., x, 


F(%15 8) «> f(%n; 8) 1 


=x = min(x), 6.) Xen = Max (x) 


ANF (yn D+ f Xan 8) a! i 


and zero otherwise. 


Generally, sufficient statistics are involved in the construction of UMVUEs. 


Theorem 10.3.4 Rao-Blackwell Let X,,..., X, have joint. pdf f(x,;....,°x,; 8), and let $ 
=(S,,..., 5,) be a vector of jointly sufficient statistics for 6. If T is any unbiased 
estimator of 7(8), and if T* = E(T |S), then 


1. T* is an unbiased estimator of 7(8), 
2. T* is a function of S, and 


3. Var(T*) < Var(T) for every 0, and Var(T*) < Var(T) for some ® unless 
T* = T with probability 1. 


Proof 


By sufficiency, f7,,(t) does not involve 6, and thus the function ¢*(s) = E(T|s) 
does not depend on @. Thus, T* = #*($) = E(T'|.S) is an estimator that is a func- 
tion of S, and furthermore, : 


E(T*) = E,{T*) 
= EglE(T | S)] 
= E(T) 
= 1(8) 
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by Theorem 5.4.1. From Theorem 5.4.3, 
Var(T) = Var[E(T|S)] + E[Var(T |$)] 
> Var[E(T | S)] 


= Var(T*) 
with equality if and only if E[Var(T|$)] =0, which occurs if and only if 
Var(T |S) = 0 with probability 1, or equivalently T = E(T |S) = T*. & 


It is clear from the Rao-Blackwell theorem that if we are searching for an 
unbiased estimator with small variance, we ‘may as well restrict attention to func- 
tions of sufficient statistics. If any unbiased. estimator exists, then there will be 
one that is a function of ‘sufficient statistics, namely.£(T |S), which also is 
unbiased and has variance at least:as’small or smaller. In particular, we still are 
interested in knowing how to find a UMVUE for a parameter, and the above 
theorem narrows our problem down somewhat. For example, consider a one- 
parameter model f(x; 6), and assume that a single sufficient statistic, S, exists. We 
know we must consider only unbiased functions of S in searching for a UMVUE. 
In some cases it may be possible to show that only one function of S is unbiased, 
and in that case we would know that it is a UMVUE. The concept of “complete- 
ness” is helpful in determining unique unbiased estimators, and this concept is 
defined in the next section. 


COMPLETENESS AND EXPONENTIAL CLASS 


Definition 10.4.1 


Completeness A family of density functions { f;(¢; 0); 8 € 2}, is called complete if 
E[u(T)] = 0 for all @ e Q implies u(T) = 0 with probability 1 for all 8 e Q. 


This ‘sometimes is expressed by saying that there are no nontrivial unbiased 
estimators of zero. In particular, it means that two different functions of T cannot 
have the same expected value. For’ example, if Efu,(7)] = 1(8) and E[u,(T)] 
= 1(8), then Ef{u,(7) — u(T)] =0, which implies u,(7) — u.(7) = 9, or u,(T) 
= u,(T) with probability 1, if the family of density functions is complete. That is, 
any unbiased estimator is unique in this case. We primarily are interested in 
knowing that the family of density functions of a sufficient statistic is complete, 
because in that case an unbiased function of the sufficient statistic will be unique, 
and it must be a UMVUE by the Rao-Blackwell theorem. 
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Theorem 10.4.7 
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A sufficient statistic the density of which is a member of a complete family of 
density functions will be referred to as a complete sufficient statistic. 


Lehmann-Scheffe Let X,,..., X,, have joint pdf f(x,,..., x,; 8), and let S be a 
vector of jointly complete sufficient statistics for 6. If T* = ¢*(S) is a statistic that 
is unbiased for 7(8) and a function of S, then T* is a UMVUE of 7(8). 


Proof 


It. follows: by completeness that. any statistic that .is.a function of § and an 
unbiased estimator. of.1(@) must be equal to T* with probability 1. If T is any 
other: statistic that isan unbiased estimator. of 1(6), then. by the Rao-Blackwell 
theorem E(T |S) also is unbiased for z(@) and a function of S, so by uniqueness, 
7* = E(T |S) with probability 1. Furthermore, Var(7*) < Var(T) for ali 8. Thus 
T* isa UMVUE of 7(8). 


Let X,, ..., X, denote a random. sample from a Poisson distribution, 
A, ~ POI(y), so that 
evmyh* 


Fy 3% N= Ty 


By the factorization criterion, S = )° X; is a sufficient statistic. We know that 

S ~ POl(ny), and we can show that a Poisson family is complete. For conve- 

nience let 6 = ny, and consider any function u(s). We have 

2 u(sje~°6* 

E[u(S)] = }) ——{— 
s=0 Ss: 

Because e ° £0, setting E[u(S)] = 0 requires all the coefficients, u(s)/s!, of 6° to 

be zero. But u(S)/s! = 0 implies u(s) = 0. By completeness, X = S/n is the unique 

function of § that is unbiased for E(X) =p, and by Theorem 10.4.1 it must be a 

UMVUE of p. 


This particular result also can be verified by comparing Var(X) to the CRLB; 
however, the CRLB approach will not work fora nonlinear function of S. The 
present approach, on the other hand, can be used to find the UMVUE of 
(6) = E[u(S)], for any function u(s) for which the expected value exists. For 
example, in the Poisson case, E(X?)=y? + y/n, so that X? =(S/n)? is the 
UMVUE of y?+y/n. It-also follows that X? — X/n = (S/n)? — S/n? is the 
UMVUE of y?. Ifa UMVUE is desired for any specified 1(w), it is only necessary 
to find:some function of S that is unbiased. for t(uz); then that will be the 
UMVUE-. If there is difficulty in finding a u(s) such that E[u(S)] = t(w), one possi- 
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bility is to find any function, h(X,, .... X,) = T, that is unbiased, and then 
E(T |S) will be an unbiased estimator that is a function of S. Thus, use of com- 
plete sufficient statistics and the Rao-Blackwell theorem provides one possible 
systematic approach for attempting to find UMVUEs in certain cases. 

Note that completeness is a property of a family of densities, and the family 
must be large enough or “complete” to enjoy this property. That is, there may be 
a nonzero u(S) whose mean is zero for some densities, but this situation may not 
hold if more densities are added to the family. If one considers a single Poisson 
distribution, say » = 1, then E(S — n) = 0, and a family consisting of this single 
Poisson density function is not complete because u(s) = s — n 0 ifs An. 

If the range of the random variable does not depend on parameters, then one 
may essentially restrict attention to families of densities that fall in the form of the 
“exponential class” when considering complete sufficient statistics, so we need not 
consider these families individually in detail. 


Definition 10.4.2 


Exponential Class A density function is said to be a member of the regular expo- 
nential class if it can be expressed in the form 


K 
f(x; ®) = c@)h(x) exp | aj0,00 | xeA (10.4.1), 
=1 
and zero otherwise, where 6 = (0-, ..., 9,) is a vector of k unknown parameters, if 
the parameter space has the form 
2 = (la; <9, <5, i=1,...8 


(note that a, = —co and b; = oo are permissible values), and if it satisfies regularity 
conditions 1, 2, and 3a or 3b given by: 


1. The set A = {x:f(x;.6) > 0} does not depend on @. 

2. The functions q,(8) are nontrivial, functionally independent, continuous 
functions of the @,. 

3a. For a continuous random variable, the derivatives ¢(x) are linearly inde- 
pendent continuous functions of x over A. 

3b. For a discrete random variable, the ¢,(x) are nontrivial functions of x on A, 
and none is a linear function of the others. 


For convenience, we will write that f(x; 6) is a member of REC(q,, ..., dx) OF 
simply REC. 
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Example 10.4.2 Consider a Bernoulli distribution, X ~ BIN(1, p). It follows that 


F(x; p) = (1 — p)* 
= (i —p)exp {x In [p/((1—p)]}} xe A= {0, 1} 
which is REC(q,) with q,(p) = In [p/(1 — p)] and ¢,(x) = x. 


Note that the notion of REC, with slightly modified regularity conditions, can 
be extended to the case where X is a-vector. 

It can: be shown: that the REC is:a-complete family for the special case when 
t,(x) = x. Many of the common density functions such as binomial, Poisson, 
exponential, gamma, and normal pdf’s are.in.the form of the REC, but we are 
particularly interested in knowing that the pdf's of the sufficient statistics from 
these models are complete. If a random sample is considered from a member of 
the REC, then a set of joint sufficient statistics is identified readily by the factor- 
ization criterion; moreover, the pdf of these. sufficient statistics also turns out to 
be in the special form of a (possibly. multivariate) REC, and therefore they are 
complete sufficient statistics. 


Theorem 10.4.2 If X,,..., X,,is a random sample from a member of the regular exponential class 
REC (q1, .-.; q,), then the statistics 


S,= Y 4X), voy SES Y 4X) 
i=1 i=1 


are a minima! set of complete sufficient statistics for 6,,..., 6,. 


—— ee 
Example 10.4.3 Consider the previous example, X ~ BIN(1, p). For a random sample of size n, 
u(x) =x; and S= } X, isa complete sufficient statistic for p. 

i=1 


‘If we desire a UMVUE of Var(X) = p(1 — p), we might try X(1 — X). Now 
E[X(1 — X)] = E(X) — E(X?) 
= p — [p? + Var(X)] 
=p—p’— pil — p)/n 
= p(1 — p)(1 — 1/n) 


and thus E[nX(1 — X)/(n—1)]=p(1—p), and this gives the UMVUE of 
p(1 — p) as cX(1 — X) where c = n/(n — 1). 
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—————r 
Example 10.4.4 If X ~ Ny, o”), then 


exp [—(x — 4)?/207] 


1 
Xi, oO) = 
F(x; 4 9) ie 
2 
Se | 2ehst 2a? 20 


For a random sample of size n, it is clear that S,; = )) X? and S,=)) X; are 
jointly complete and sufficient statistics of w and o”. Because MLEs are one-to- 
one functions of S; and §,, they also could be used as the jointly complete 


[oes aS sufficient statistics here. 


It can be shown that under mild regularity conditions, families of density func- 
tions that admit k-dimensional sufficient statistics for all sample sizes must be 
k-parameter RECs. 

Thus, for the regular.case, for-most-practical purposes Theorem 10.4.2 covers 
all of the models that admit complete sufficient statistics, and there is no point in 
attempting to find complete sufficient statistics in the regular case for models that 
are not in the REC form. 

We have seen that a close connection exists among the REC, complete suffi- 
cient statistics, and UMVUEs. Also, MLEs are functions of minimal sufficient 
statistics, and the MLEs are asymptotically efficient with asymptotic variance 
being the CRLB. If we call an estimator whose variance achieves the CRLB a 

_ CRLB estimator, then the following theorems can be stated. 


Theorem 10.4.3 If a CRLB estimator T exists for t(6), then a single sufficient statistic exists, and 
T is a function of the sufficient statistic. Conversely, if a single sufficient statistic 
exists and the CRLB exists, then a CRLB estimator exists for some 1(8). 


Theorem 70.4.4 If the CRLB exists, then a CRLB estimator will exist for some function 7(8) if and 
only if the density function is a methber of the REC. Furthermore, the CRLB 
estimator of 7(6) will be 1(6), where 6 is the MLE of 0. 


Most pdf’s of practical interest that are not included in the REC belong 
to another. general. class, which allows the range of X, denoted by A 
= {x: f(x; 6) > 0}, to depend on 8. 
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Definition 10.4.3 


A density function is said to be a member of the range-dependent exponential class, 
denoted by RDEC (q,, ..., q,), if it satisfies regularity conditions 2 and 3a or 3b of 
Definition 10.4.2, forj = 3,..., k, and if it has the form 


k 
T(x; 9) = c(®)h(x) exp 1 Dr g(Os, +s anes | (10.4.2) 
i=3 


where A = {x|q,(01, 02) <x < q,(0;,9,)} and 662. 


We will include as special cases the following: 
1. The one-parameter case, where 
F(x; 8 = c(h) (10.4.3) 
with 
A = {x|q,(0) <x < q,(9)} 
2. The two-parameter case, where 
L(&3 94, 82) = c(8,, O2)h(x) (10.4.4) 
with 
A = {x1q4(91, 82) < x < q2(8,, 92)} 


The following theorem is useful in identifying sufficient statistics in the range- 
dependent case. 


Theorem 10.4.5 Let X,,..., X, be a random sample from a member of the RDEC (q,,..., q,). 


1. Ifk > 2, then S$; = X,,,, 8; =Xy., and S3,..., 5, where S,;= ¥ é(X) 
i=1 


are jointly sufficient for ® = (6,,..., 6,). 

2. In the two-parameter case, S$; = X,,, and S, = An are jointly sufficient 
for 8 = (9,, 03). 

3. In. the one-parameter case, S, = X,,, and S,=X,,., are jointly 
sufficient for 9. If q,(6) is increasing and (6) is decreasing, then T, 
= min (4; *(X1.,), 42 (Xp:n)] isa single sufficient statistic for 6. If g,(0) is 
decreasing and q,(6) is increasing, then T, = max [q;1(Xj.,), 42 '(Xmn)] 
is a single sufficient statistic for 0. B 


If one of the limits is constant and the other depends on a single parameter, say 
0, then the following theorem can be stated. 
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Theorem 10.4.6 Suppose that X,,..., X,, is a random sample from a member of the RDEC. 


— 
Example 10.4.5 


—-——— 
Example 10.4.6 


1. If k > 2 and the lower limit is constant, say q,(6) =a, then X,,,, and the 
statistics A(X )) are jointly sufficient for @ and 6;; j = 3, ..., k. If the 
upper Limit is constant, say q,(0)=b, then X,,, and the statistics 
x é,(X;) are jointly sufficient for 9 and 6;; j = 3,...,k. 
i=1 


2. In the one-parameter case, if q,(6) does not depend on @, then S, = X,,, 
is sufficient for 0, and if q,(6) does not depend on @, then S; = Xj,, is 
sufficient for @. 2 


Consider the pdf 
i 
;)= — —6 6 
S(x; 8) 0 <x< 
and zero otherwise. We have q,(6) = —@, a decreasing function, and q,(@) = @, an 


increasing function of 6. Thus, by Theorem 10.4.5, T, = max [—X1,,, Xp:,] 8 a 
single sufficient statistic for 6. 


Consider a two-parameter exponential distribution, X ~ EXP(@, n). 
f(x; 9, n) = (1/8) exp [—(x — 0/8] q<x< 0 
= (1/6) exp.(n/@)-exp (—x/6) N< xX <0 
If X,,..., X, is a random sample, then it follows from Theorem 10.4.6 that X;., 


and ¥) X; are joint sufficient statistics for (0, n). Because q2(y) = 00 is not a func- 


———— 
Example 10.4.7 


tion of parameters, X,,., is not involved. 
Suppose that 6 is known, say 6 = 1. Then 


f(xi;m =e F*? — n<x<mw 


7x 


=e *e" Y<X<.00 


We see that X,.,, is sufficient for 7. This is consistent with earlier results, where we 
found that estimators of y based on X,,, were better than estimators based on 
other statistics, such as X, for this model. 


Consider a random sample of size n from a uniform distribution, 
X; ~ UNIF(6,, 62). Because 


1 
S (x; 04, 02) = 6, 


it follows from Theorem 10.4.5 that X,., and X,,., are jointly sufficient for (0,, 05). 


0,<x<, 
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The previous results deal only with finding sufficient statistics for members of 
the RDEC. These statistics also may be complete, but this must be verified by a 
separate argument. 


Example 10.4.8 Consider. a.random sample of. size n.from.a uniform distribution xX; 
~ UNIF(O, 6). It follows from the previous theorem that X,,., is sufficient for 0. 
The pdf of S = X,,.,, is 


S(s; 0) = ns" 1/0" O<s <4 


and zero otherwise. 
To verify completeness, assume that E[u(S)] = 0 for all @> 0, which means 
that 


Q 
| [u(s)ns”~*/0"] ds = 0 
0 
if we multiply by 6" and differentiate with respect to 6, then u(0)n6"~1 = 0 for all 


@ > 0, which implies u(s)=0 for all s>0, and thus S is a complete sufficient 


[Statistic for 8. 


The following interesting theorem sometimes is useful for establishing certain 
distributional results. 


Theorem 10.4.7 Basu Let X,, ..., X, have joint pdf f(x), ..., x,; 0); @€6Q. Suppose that 
S = (8,,..., 5,) where S,,..., 5, are jointly complete sufficient statistics for 6, and 
suppose that T is any other statistic. If the distribution of T does not involve @, 
then S and 7 are stochastically independent. 


Proof 


We will consider the discrete case. Denote by f(#), f(s; 8), and f(t|s) the pdf’s of T, 
S, and the conditional pdf of T given S = s, respectively. Consider the following 
expected value relative to the distribution of S: 


ESfO —f(E1S)] = fO—X fel s)f(s; ®) 
=fO-ZL S58) 
=f9-JSO=0 


Because S is a complete sufficient statistic, f(¢|s) = f(¢), which means that S and 
T are stochastically independent. 
The continuous case is similar. 


(<a a ee ce 
Example 10.4.9 
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Consider a random sample of size n from a normal distribution, X; ~ N(u, ¢), 
and consider the MLEs, ff = X and 6? = }° (X, — X)?/n. It is easy to verify that 
X is a complete sufficient statistic for yz, for fixed values of a. Also 


a2 


no 
ao ra-d 


which does not depend on y. It follows that X and 6? are independent random 
variables. Also, X and 6? are jointly complete sufficient statistics for u and o, and 
quantities of the form (X; — X)/é are distributed independently of yu and o, so 
these quantities are stochastically independent of X and 6”. 


SUMMARY 


Our purpose in this chapter was to introduce the concepts of sufficiency and 
completeness. Generally speaking, a statistic provides a reduction of a sei of data 
from some distribution to a more concise form. if a statistic is sufficient, then it 
contains, in a certain sense, all of the “information” in the data concerning an 
unknown parameter of the distribution. Although sufficiency can be verified 
directly from the definition, at least theoretically, this. usually:can be accom- 
plished more easily by using the factorization criterion. 

If a statistic is sufficient and a unique MLE exists, then the MLE is a function 
of the sufficient statistic. Sufficient statistics also are important in the construc- 
tion of UMVUEs. If a statistic is complete as well as sufficient for a parameter, 
and if an unbiased estimator of the parameter (or a function of the parameter) 
exists, then a UMVUE exists and it is a function of the complete sufficient sta- 
tistic. It often is difficult to verify completeness directly from the definition, but a 
special class of pdf’s, known as the exponential class, provides a convenient way 
to identify complete sufficient statistics. 


EXERCISES 
Let X,,..., X, be a random sample from a Poisson distribution, X, ~ POI). Verify that 


. n . 
s= > X; is sufficient for y by using equation (10.2.1). 
i=1 
Consider a random sample of size n from a geometric distribution, X; ~ GEO(p). Use 
equation (10.2.1) to show that S = }° X; is sufficient for p. 


Suppose that X,,..., X, is a random sample from a normal distribution, X,; ~ N(0, 8). 
Show that equation (10.2.1) does not depend on 0 if S = }) X?. 
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Consider a random sample of size n from a two-parameter exponential distribution, X, ~ 
EXP(1, 4). Show that S = X,.,, is sufficient for 7 by using equation (10.2.1). 


Let Xj,..., X, be a random sample from a gamma distribution, X; ~ GAM(@, 2). Show 
that S = >’ X, is sufficient for @ 

(a) by using equation (10.2.1), 

(b) by the factorization criterion of equation (10.2.3). 


Suppose that X,, X,,..., X, are independent, with X; ~ BIN(m,, p);i = 1, 2,..., n. Show 


that S = )° X, is sufficient for p by the factorization criterion. 


1 
Let X,, X2,..., X,, be independent with X; ~ NB(r,, p). Find a sufficient statistic for p. 
Rework Exercise 4 using the factorization criterion. 


Consider a random sample of size n from a Weibull distribution, X, ~ WEI(@, £). 
(a) Find a sufficient statistic for 6 with B known, say B = 2. 
(b)_If 8 is unknown, can you find a single sufficient statistic for B? 


Let X,,..., X, be a random sample from a normal distribution, X;~ N(w, a”). 
(a) Find a single sufficient statistic for » with o? known. 
(b) Find a single sufficient statistic for ¢? with » known. 


Consider a random sample of size n from a uniform distribution, X,; ~ UNIF(Q,, 0,). 
(a) Show that Xj,, is sufficient:for 6;, if @z is known. 
(b) Show that X;,, and.X,,,, are jointly sufficient for 6, and 6,. 


Let X,,....,X, be a random sample from a two-parameter exponential distribution, 
X, ~ EXP(8, 4). Show that X,., and X are jointly sufficient for @ and 7. 


Suppose that X,,..., X,, is a random sample from a beta distribution, X,~ BETA(4,, 6). 
Find joint sufficient statistics for 6, and 6,. 


Consider a random sample of size n from a uniform distribution, X, ~ UNIF(8, 28); @ > 0. 
Can you find a single sufficient statistic for 8? Can you find a pair of jointly sufficient 
statistics for 0? 


For the random sample of Exercise 2, find the estimator of p obtained by maximizing the 
pdf of S =)’ X;, and compare this with the usual MLE of p. 


For the random variables X,,..., X,, in Exercise 7, find the MLE of p by maximizing the 
pdf of the sufficient statistic. Is this the sime as the usual MLE? Explain why this result is 
expected. 


Consider the sufficient statistic, S = X,,., of Exercise 4: 
(a) Show that S also is complete. 
(b) Verify that X,,,—1/nis the UMVUE of n. 
(c) Find the UMVUE of the pth percentile. 


78. 


79. 


20. 


21. 


2a. 


23. 


24, 


25. 


26. 
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Let X ~ N(0, 6); @ > 0. 
(a) Show that X? is complete and sufficient for 0. 
(b) Show that N(O, 9) is not a complete family. 


Show that N(u, 2) does not belong to the regular exponential class. 


Show that the following families of distributions belong to the regular exponential class, 
and for each case use this information to find complete sufficient statistics based on a 
random sample. X,,..., X,- 

(a) BIN(1,p);0<p <1. 

(b) POI(w); u > 0. 

(c) NB@, p);r known,0<p< tl. 

(d). Nw, 0”); -a <p < 0,07 >0. 

(e). EXP(@); 8 >.0. 

(f) GAM@, x); 6.>.0,« > 0. 

(g) BETA(@,, @,); 0, > 0, 8, > 9. 

(h) WEIL, B); B known, 6 > 0. 


Let X,,..., X, be a random sample from a Bernoulli distribution, X; ~ BIN(1, p); 
O<p<t. 

(a) Find the UMVUE of Var(X) = p(t — 3). 

(b) Find the UMVUE of p’. 


Consider a random sample of sizen froma Poisson distribution, X, ~ POI(u); # > 0. Find 
the UMVUE of P[X = 0] = e-“. Hint: Recall Exercise 33(g) of Chapter 9. 


Suppose that X,,..., X,, is arandom sample from a normal distribution, X; ~ N(w, 9). 

(a) Find the UMVUE of the 95th percentile. 

(b) Find the UMVUE of PLX < c] where c.is a known constant. 

Hint: Find the conditional distribution of X, given X = X and apply the Rao-Blackwell 
theorem with T = u(X,), where u(x,) = 1 if x, <c, and zero otherwise. 


If X ~ POI(w), show that S = (—1)* is the UMVUE of e~*. Is this a reasonable 
estimator? 
Consider a random sample of size n from a distribution with pdf f(x; 0) = 0x°~* if 
0 <x <1 and zero otherwise; 6 > 0. 

(a) Find the UMVUE of 1/0. Hint: E[—In X] = 1/0. 

(b) Find the UMVUE of 6. 


For the random sample of Exercise 11, show that the jointly sufficient statistics X ,,, and 
X j:n also are complete. Suppose that it is desired to estimate the mean y = (0, + 6,)/2. 
Find the UMVUE of y. Hint: First find the expected values E(X ,,,) and E(X,,,,,) and show 
that (X,,, + X,,,)/2 is unbiased for the mean. 
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27, Let X,,...,X, be a random sample from a normal distribution, X; ~ N(u, 07). 
(a) Find the UMVUE of o?. 
(b) Find the UMVUE of the 95th percentile, 
Hint: Recall Exercise 20 of Chapter 9. 


28. Use Theorems 10.4.5 and 10.4.6 to find sufficient statistics for the parameters of the 
distributions in Exercises 5 and 6(b) of Chapter 9. 


29. Consider a random sample of size n from a gamma distribution, X; ~ GAM(@, x), and let 
X = (1/n) ¥, X; and X = ([] X)"" be the sample mean and geometric mean, respectively. 
(a) Show that X and X are jointly complete and sufficient for 6 and x. 
(b) Find the UMVUE of yp = 6x. 
(c) Find the UMVUE of x”. 
(d) Show that the distribution of T = X/X does not depend on @. 
(e) Show that X and T are stochastically independent random variables. 
(f) Show that the conditional pdf of X given X¥ = X does not depend on x. 


80. Consider a random sample of size n from a two-parameter exponential distribution, X, ~ 
EXP(@, 7). Recall from Exercise 12 that X,,, and X are jointly sufficient for 6 and y. 
Because X,,,, is complete and sufficient for 4 for each fixed value of 6, argue from Theorem 
10.4.7 that X,,,, and T = X,,, — X are stochastically independent. 


(a) Find the MLE 6 of 6. 

(b) Find the UMVUE of 7. 

(c) Show that the conditional pdf of X,,, given X does not depend on 8. 
(d) Show-that the distribution of @ = (X,,, — n)/6.is free of'y and 6. 


37. Let X,,...,X, be a random sample of size n from a distribution with pdf 


es Of G(LE STEAM 00 < x 
S08: =) a 


(a) Find the MLE 6 of 6. 

(b).. Find a complete sufficient statistic for 6. 

(c) Find the CRLB for 1/6. 

(d) Find the UMVUE of 1/8. 

(e) Find the asymptotic normal distribution for 6 and also for 1(8) = 1/6. 
(f) Find the UMVUE of 8. 


32. Consider a random sample of size n from a distribution with pdf 


(in 6)* 
f(*; B= 4 Ox! 
0 otherwise 


x=0,1,...;48>1 


(a) Find a complete sufficient statistic for @. 
(b) Find the MLE of 6. 
(c) Find the CRLB for @. 


33. 
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(d) Find the UMVUE of In 8. 

(e) Find the UMVUE of (In 6)?. 

(f) Find the CRLB for (In 6)’. 
Suppose that only the first r order statistics are observed, based on a random sample of 
size n from an exponential distribution, X; ~ EXP(@). In other words, we have a Type II 
censored sample. 

(a) Find the MLE of @ based only on X,,,,..., Xpn- 

(b) Relative to these order statistics, find a complete sufficient statistic for 0. 


11d 


INTERVAL ESTIMATION 


INTRODUCTION 


Example 77.7.7 
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The problem of point estimation was discussed in Chapter 9. Along with a point 
estimate.of the value of a parameter, we want to have some understanding of | 
how close we can expect our estimate to be to the true value. Some information | 
on this question is provided by knowing the variance or the MSE of the estima- 
tor. Another approach would be to consider interval estimates; one then could 
consider the probability that such an interval will contain the true parameter 
value. Indeed, one could adjust the interval to achieve some prescribed probabil- 
ity level, and thus a measure of its accuracy would be incorporated automatically 
into the interval estimate. 


In Example 4.6.3, the observed lifetimes (in months) of 40 electrical parts were 
given, and we argued that an exponential distribution of lifetimes might be rea- 
sonable. Consequently, we will assume that the data are the observed values of a 


11.2 
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random sample of size n = 40 from an exponential distribution, X; ~ EXP(), 
where @ is the mean lifetime. Recall that in Example 9.3.4 we found that the 
sample mean, X, is the UMVUE of 6. For the given set of data, the estimate of @ 
is x = 93.1 months. Although we know that this estimate is based on an estima- 
tor with optimal properties, a point estimate in itself does not provide informa- 
tion about accuracy. Our solution to this problem will be to derive an interval 
whose endpoints are random variables that include the true value of @ between 
them with probability near 1, for example, 0.95. 

It was noted in Example 9.3.2 that 2nX/0@ ~ 77(2n), and we know that percen- 
tiles of the chi-square distribution are given in Table 4 (Appendix C). For 
example, with n= 40 and v= 80, we find that 72 925(80) = 57.15 and 
13.975(80) = 106.63. It follows that P[57.15 < 80X/8 < 106.63] = 0.975 — 0.025, 


and consequently 


P[80X/106.63 < @ < 80X/57.15] = 0.95 


In general, an interval with random endpoints will be called a random interval. 
In particular, the interval (80X/106.63, 80X/57.15) is a random interval that con- 
tains the true value of @ with probability 0.95. If we now replace X with the 
estimate X = 93.1, then the resulting interval is (69.9, 130.3). We will refer to this 
interval as a 95% confidence interval for 6. Because the estimated interval has 
known endpoints, it is not appropriate to say that it contains the true value of @ 
with probability 0.95. That is, the parameter 0, although unknowa, is a constant, 


and this particular interval either does or does not contain 6. However, the fact 


that the associated random interval had probability 0.95 prior to estimation 
might lead us to assert that we are “95% confident” that 69.9 < 6 < 130.3. 

The rest of the chapter will include a formal definition of confidence intervals 
and a discussion of general methods for deriving confidence intervals. 


CONFIDENCE INTERVALS 


Let X,,...,X, have joint pdf f(x,,.--,%n 6); @¢Q, where 2 is an 
interval. Suppose that L and U are statistics, say L=¢(X,, ..., X,) and U 
=.«(X,,..., X,). If an experiment yields data x;, ..., X,, then we have observed 
values ¢(x1,..., X,) and «(x ,,..., Xn): 
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Definition 77.2.7 


Confidence Interval An interval (¢(x,, ..., x,), #(x4,+.-, X,)) is called a 100y% cen- 
fidence interval for 0 if 


PI(Xy,  X) <0 AX KO SY (11.2.1) 


where 0<y <1. The. observed. values ¢(x,, ..., x,) and a(x,,...,x,) are called 
lower and upper confidence limits, respectively. 


Other notations that often are encountered in the statistical literature are 0, 
and 6, for lower and upper confidence limits, respectively.. We also sometimes 
will use the abbreviated notations ¢(x) = ¢(x,,..., x,) and a(x) = a(x,,..., x,) to 
denote the observed limits. 

Strictly speaking, a distinction should be made between the random interval - 
(L, U) and the observed interval (¢(x), «(x)) as mentioned previously. This situ- — 
ation is analogous to the distinction in point estimation between an estimator | 
and an estimate. Other terminology, which is useful in maintaining this distinc-— 
tion, is to call (L, U) an interval estimator and (¢(x), (x)) an interval estimate. — 
The probability level, y, also is called the confidence coefficient or confidence level. _ 

Perhaps the most common interpretation of a.confidence interval is based on 
the relative frequency property of probability. Specifically, if such interval esti- — 
mates are computed from many different samples, then in the long run we would 
expect approximately 100% of the intervals to include the true value of @. That _ 
is, our confidence is in the method, and. because of Definition (11.2.1), the con- 
fidence level reflects the long-term frequency interpretation of probability. 

It often is desirable to have either a lower or an upper confidence limit, but not | 
both. 


Definition 17.2.2 
- One-Sided Confidence Limits 


1. If 
PLAX, ..., X,) << J =y (11.2.2) 
then ¢(x) = 7(x,,..., x,) is.called a one-sided lower 100y% confidence limit 
for 6. 
2. If 
PLO < {X,,..., XJ =? (11.2.3) 


then u(x) = «(x,,..., x,) is called a one-sided upper 100y% confidence limit 
for 6. 
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It may not always be clear how to obtain confidence limits that satisfy Defini- 
tions 11.2.1 or 11.2.2. The concept of sufficiency often offers some aid in this 
problem. If a single sufficient statistic S exists, then one might consider finding 
confidence limits that are functions of §. Otherwise, another reasonable statistic, 
such as an MLE, might be considered. 


We take a random sample of size n from an exponential distribution, 
X,; ~ EXP(6), and we wish to derive a one-sided lower 100y% confidence limit for 
9. We know that X is sufficient for 6 and also that 2nX/0 ~ y7(2n). As mentioned 
in Chapter 8, pth percentiles, y7(v), are provided in Table 4 (Appendix C). Thus, 


y = P[2nX/0 < x3(2n)] 
= P[2nX/y3(2n) < 9] 
If X is observed, then a one-sided lower 100y% confidence limit is given by 
£(x) = 2nx/x5(2n) (11.2.4) 
Similarly, a one-sided upper 100y% confidence limit is given by 
a(x) = 2nx/x7 -,(2n) (11.2.5) 


Notice that in the case of an upper limit we must use the value 1 — y rather 
than ) when we read Table 4. For example, if a one-sided upper 90% confidence 
limit is desired 1 — 0.90 = 0.10. For a sample of size n = 40, the required percen- 
tile is 2 4o(80) = 64.28, and the desired upper confidence limit has the form 
a(x) = 80x/64.28. 

Suppose that we want a 100y% confidence interval for 9. If we choose values 
a, >0 and a, >Osuchthate,+o,=a=1—y), then it follows that 


PLy2,(2n) < 2nX/0 < xj-2,(2n)] = 1-02. —% 
and thus 
P[2nX/73_,,(2n) < 8 < 2nX/y7,(2n)] = 7 


It is common in practice to let ¢; = a, which is known as the equal tailed choice, 
and this would imply «, = «, = 4/2. The corresponding confidence interval has 
the form 


(2nx/x oe aj2(2n), 2nx/x2/(2n)) (11.2.6) 


Generally speaking, for a prescribed confidence level, we want to use a method 
that produces an interval with some optimal property such as minimal length. 
Actually, the length, U — L, of the corresponding random interval generally will 
be a random variable, so a criterion such as minimum expected length might be 
more appropriate. For some problems, the equal tailed choice of «, and a will 
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provide the-minimum expected length, but for others it will not. For example, 
interval (11.2.6) of the previous example does not. have this property (see 
Exercise 26). 


Consider a random sample from a normal distribution, X; ~ N(u, o?), where a? is 
assumed to be known. In this case X is sufficient for yu, and it is known that 
Z= Jn{X — w/o ~ N(O, 1). By symmetry, we also know that 2,2 = —Z1=4)2, 
and thus 


Pe SPD 24 a2 Ss /aX — P/F < 21-2] 
= P[X — 24-4 o/x/n <B<X 4+ 2-42 o/,/7] 


It follows that a 100(1 — «)% confidence interval for yw is given by 


(X— 21 a2 o//n, % +2, -256/,/n) (11.2.7) 


For example, for a 95% confidence interval, 1 — «/2 = 0.975 and the upper and 
lower confidence limits are x + 1.960/./n 3 


Notice that this solution ‘is not acceptable if'o? is unknown, because the con- 
fidence limits then would depend on an unknown parameter and could not be 
computed. With a slightly modified derivation if will be possible to obtain a 
confidence interval for y, even if o? is an unknown “nuisance parameter.” Indeed, — 
a major difficulty in determining confidence intervals arises in multiparameter 
cases where unknown nuisance parameters are present. A-general method that 
often provides a way of dealing with this problem is presented in the next section. 

In multiparameter cases it also may be desirable to have a “joint confidence 
region” that applies to all parameters simultaneously. Also, a confidence region 
for a single parameter, in the one-dimensional case, could be some set other than 
an interval. In general, if 8 ¢ ©, then any region A,(x,, ..., x,) in Q is a 100y% 
confidence region if the probability is y that A,(X;,..., X,,). contains the true value 
of 8. 


PIVOTAL QUANTITY METHOD 


Suppose. that X,,..., X, has joint pdf f(x,, ..., x,;.6), and we wish to obtain 
confidence limits for 6 where other unknown nuisance parameters also may be 
present. 
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Definition 17.3.7 


Pivotal Quantity If Q = o(X,, ..., X,; 9) is a random variable that is a function 


only of X;,..., X, and 9, then @Q is called a pivotal quantity if its distribution does 
not depend on @ or any other unknown parameters. 


In Example 11.2.1, we encountered a chi-square distributed random variable, 
which will be denoted here as Q = 2nX/0, and which clearly satisfies the defini- 
tion of a pivotal quantity. In that example we were able to proceed from a prob- 
ability statement about Q to obtain confidence limits for 0. 

More generally, if Q is.a pivotal quantity for a parameter 6 and if percentiles of 
Q, say q, and qz2, are available such that 


Play < (Xi, ..., X59 < 2] =y (11.3.1) 


then for an observed sample, x,,..., x,, a 100y% confidence region for 6 is the 
set of 6 € Q that satisfy : 


q1 < g(X4; see Xn ) < qd (11.3.2) 


Such a confidence region will not necessarily be an interval, and in general it 
might be quite complicated. However, in some rather important situations con- 
fidence intervals can be obtained. One general situation that will always yield 
an interval occurs when, for each fixed set of values x,,..., x,, the function 
AX1, --+> X,3 9) is a monotonic increasing (or decreasing) function of 0. It also is 
possible to identify certain types of distributions that will admit pivotal quan- 
tities. Specifically, Chapter 3 included a discussion of location and scale param- 
eter models, which include most of the special distributions we have considered. 
Recall that a parameter @ is a location parameter if the pdf has the form f(x; @) 
= fo(x — 9), and it is a scale parameter if it has the form f(x; 6) = (1/0) fo(x/6), 
where fo(z) is a pdf that is free of unknown parameters (including 6). In the case of 
location-scale parameters, say 0, and 6,, the pdf has the form f(x; 6,, 6) 
= (1/0) folx — 9,)/02]. If MLEs exist in any of these cases, then they can be used 
to form pivotal quantities. 


Let X,, ..., X, be a random sample from a distribution with pdf f(x; 6) for 
6 € Q, and assume that an MLE 6 exists. 


1. If @ is a location parameter, then Q = 6 — @ is a pivotal quantity. 
2. If @ is a scale parameter, then Q = 6/0 is a pivotal quantity. 


We already have seen examples of pivotal quantities that are slight variations 
of the ones suggested in this theorem. Specifically, recall Example 11.2.2, where 
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X;~ Nu, a”). With o? known, yu is a location parameter, and the MLE is X; 
thus X — y is a pivotal quantity. In Example 11.2.1, X; ~ EXP(6), so that 6 is a 
scale parameter and the MLE is X; thus X/6 is a pivotal quantity. Notice that it 
sometimes is convenient to make a slight modification, such as multiplying by a 
known scale factor, so that the pivotal quantity has a known distribution. For 
example, we know that 2nX/0 ~ y7(2n), which has tabulated percentiles, so it 
might be better to let this be our pivotal quantity rather than X/6. 


Theorem 11.3.2 Let X,, ..., X, be.a random sample from a distribution with location-scale 
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parameters 


i — 6 
P58, Oa) =F al? > ) 


If MLEs 6, and 6, exist, then (6, — 6,)/6, and 6,/0, are pivotal quantities for 6, 
and @,, respectively. 


We will not prove this theorem here, but details are provided by Antle and 
Bain (1969). 

Notice also that (6, — 6,)/@, has a distribution that is free of unknown param- 
eters, but itis nota pivotal quantity unless 6, is known. 

If.sufficient statistics exist, then MLEs can be found that are functions of them, 
and the method should provide good results. 


Consider a random sample from a normal distribution, X; ~ N(u, 07), where 
both 4 and o? are unknown. If f and é are the MLEs of yw and o, then (f — w/é 
and é/o are pivotal quantities, which could be used to derive confidence intervals 
for each parameter with the other considered. as an unknown nuisance parameter. 
It will be convenient to express the results in terms of the unbiased estimator 
S? = né*/(n — 1) to take advantage of some known distributional properties, 
namely 


—= ~ (n — 1) (11.3.3) 
aa 
and 
= 2 
SS = ~ x(n — I) (11.3.4) 


o 
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If ty-a2 = t1-e2(n — 1) is the (1 — «/2)th percentile of the ¢ distribution with 
n — 1 degrees of freedom, then 


K-—yu | 
{an P| malt ST Shaw 
1—a/2 SiJn 1—a/2 
= P[X — SLING << Xt+t-a2 S/,/n] 
which means that a 100(1 — «)% confidence interval for y is given by 


(% — ty apa Sl, & + ty -aya 8/0) (11.3.5) 
with observed values x and s. 
Similarly, if 72. = xZ2(n — 1) and X2 -aj2 =i -a2(n — 1) are the («/2)th and 
(1 —a/2)th percentiles of the chi-square distribution with n—1 degrees of 
freedom, then 


1—-a= Plyii2 <(n— 1)S?/0? < Xi-w2l 
a 2 ane | 2 
=o cig? <3 *] 


Xx ta/2 Xai2 
and a 100(1 — «)% confidence interval for o? is given by 
‘(Cie 1)s*/x7 —a/2 ,(a— 1)s”/x2)2) (11.3.6) 


Also, confidence limits for o are obtained by computing the positive square roots 
of these limits. 


In general, if (9,, Oy) is a 100y% confidence interval for a parameter @, and if 
2(9) is a monotonic increasing function of 6 € ©, then (t(8,), t(8y)) is a 100% 
confidence interval for 7(9). 


In Example 9.2.13, the computation of MLEs for the parameters of a Weibull 
distribution X; ~ WEI(6, 8), was discussed. Although the Weibull distribution is 
not a location-scale model, it is not difficult to show that the distribution of 
Y, = In X, has an extreme-value distribution that is a location-scale model. Spe- 
cifically, 


F(Y; 91, 82) = (1/82) fol(y — 91)/82] : (11.3.7) 


where fol2) = exp (z — e”). The relationship between parameters is 9, = In 6 and 
6, = 1/B, and thus 


ete ae 6,-—90 
Q, =B In (6/6) = ar a (11.3.8) 


2 
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and 
0,= /B = (6,/0,)~! (11.3.9) 


are pivotal quantities for 6 and f£. Because the MLEs must be computed by 
iterative methods for this model, there is no known way to derive the exact dis- 
tributions of Q, and Q,, but percentiles can be obtained by computer simulation. 
Tables of these percentiles and derivations of the confidence intervals are 
given by Bain and Engelhardt (1991). Approximate distributions are given in 
Chapter 16. 


It may not always be possible to find a pivotal quantity based on MLEs, but 
for a sample from a continuous distribution with a single unknown parameter, at 
least one pivotal quantity can always be derived by use of the probability integral 
transform. 

If X; ~~ f(x; 0) and if F(x; 6) is the CDF of X;,, then it follows from Theorem 
6.3.3 that F(X;; 0) ~ UNIF(0, 1), and consequently ¥, = —In F(X,; 6) ~ EXP(1). 
For a random sample X,,..., X,,, it follows that 


—2}' In F(X;; 0) ~ x?(2n) (11.3.10) 
isi : 
so that 
P| xian < —23) In F(X;; #) < 7A -anl2”| =l1-—« (11.3.11) 
i=1 


and inverting this statement will provide a confidence region for 6. If the CDF is 
not.in closed form or if it is too complicated, then the inversion may have to be 
done numerically. If F(x; @) is a monotonic increasing (or decreasing) function of 
6, then the resulting confidence region will be an interval. Notice also that 
1 — F(X,; 0) ~ UNIF(O, 1), and 


—2 5 In [1 — F(X;; 0)] ~ 77(2n) (11.3.12) 
i=1 
In general, expressions (11.3.10) and (11.3.12) will give different intervals, and 


perhaps computational convenience would be a reasonable criterion for choosing 
between them. : 


Consider a random sample from a Pareto distribution, X;~ PAR(1, x). The 
CDF is 


F(x; x)= 1-—(14+x)™* x>0 


r 
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If we use equation (11.3.12), then —In [1 — F(x; x)] =x In (1 + x), so 
2x ¥. In (1 + X,) ~ x?(2n) 
i=t 
and a 100(1 — «)% confidence interval has the form 


; Xg2(2n) Xi -aj2(2n) 

2¥ In(1 +x)’ 2¥ In (i + x) 
The solution based on equation (11.3.10) would be much harder because the 
resulting inequality would have to be solved numerically. 


For discrete distributions, and for some multiparameter problems, a pivotal 
quantity may not exist. However, an approximate pivotal quantity often can be 
obtained based on asymptotic results. The normal approximation to the binomial 
distribution, as discussed in Chapter 7, is an example. 


APPROXIMATE CONFIDENCE INTERVALS 


Let X,,...,X, be arandom sample from a distribution with pdf f(x; 6). As noted 
in Chapter 9, MLEs are asymptotically normal under certain conditions. 


Consider a random sample from a Bernoulli distribution, X; ~ BIN(1, p). The 
MLE of p is p=) X;/n. We also know that f is sufficient and that 
>\ X; ~ BIN(,, p), but there is no pivotal quantity for p. However, by the CLT, 


— 4 > 
ad eae Se ~ N(O, 1) _(11.3.13) 


V PL — p/n 


and consequently for large n, 


Al esas < poe oe < nisi x=l-a 
/ P(L — p)/n 


This approximation is enhanced by using the continuity correction, as discussed 
in Chapter 7, but we will not pursue this point. Limits for an approximate 
100(1 — «)% confidence interval (p,, Pi) for p are obtained by solving for the 
smaller solution of 


(11.3.14) 


P= Pee 3: (11.3.15) 


/ Poll — Po)/n ae 


and the larger solution of 


P= Di 


Vv P,(1 — p,)/n Beaawe 


(11.3.16) 
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The common practice in this problem is to simplify the limits by using the 
limiting result that 


pp 


Vv PUL — p)/n 


as n— 00, which was shown in Example 7.7.2. Thus, for large n, we also have the 
approximate result 


4Z~N(, 1) (11.3.17) 


p-pP 


P| —21-aa <-> ———— < a | =il-—a (11.3.18) 
VBL — p)/n 


This statement is much easier to invert, and approximate confidence limits for p 
are given by 


Bt 2, -g)2>/ B(L — p)/n (11.3.19) 


An important -point-here.is that-the random variables defined by expressions 
(11.2.13) and (11.3.17) are not pivotal quantities for any finite n, because their 
exact distributions depend on p. However, the limiting distribution is standard 
normal, which does not involve p, and hence the degree to which the exact dis- 
tribution depends on p should be small for large n, and the variables could. be 
regarded as “approximate” pivotal quantities. 


Other important distributions also admit approximate pivotal quantities. 


Consider a random sample of size x from a Poisson distribution, X,; ~ POI(y). By 
the CLT, we know that 


AEE 7S NO, 1) (11.3.20) 
fun 


and thus by Theorem 7.7.4 that 


X—u 


4Z~N(, 1) (11.3.21) 
X/n 


as n- oo. Either of these random variables could be used to derive approximate 
confidence intervals, although expression (11.3.21) would be more convenient. 
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Actually, it is possible to generalize this approach when MLEs are asymp- 
totically normal (see Exercise 29). 


GENERAL METHOD 


Example 19.4.7 


If a pivotal quantity is not available, then it is still possible to determine a con- 
fidence region for a parameter 0 if a statistic exists with a distribution that 
depends on @ but not on any other unknown nuisance parameters. Specifically, 
let X,,..., X, have joint pdf f(x,, ...,.x,; @), and S = o(X,, ..., X,) ~ gs; 9). 
Preferably S will be sufficient for 6, or possibly some reasonable estimator such 
as an MLE, but this is not required. 

Now, for each possible value of 6, assume that we can find values h,(@) and 
h,(8) such that 


P[h,(9) < 8S <h,(@)]=1—« (11.4.1) 


If we observe S = s, then the set of values 0 ¢ Q that satisfy h,(@) < s < h,(6) 
form a 100(1 — «)% confidence region. In other words, if @) is the true value of 6, 
then @, will be in the confidence region if and only if hy(@9) < s < h2(69), which 
has 100(1 — «)% confidence level because equation (11.4.1) holds with @ = ) in 
this case. Quite often 4,(@) and h,(@) will be monotonic increasing (or decreasing) 
functions of 6, and the resulting confidence region will be an interval. 


Consider a random sample of size n from the continuous distribution with pdf 


1/0? ~(x — 0/6? > 
fo; = {OM owt (x — @/0"] aa 


with @ > 0. There is no single sufficient statistic, but X,,, and )' X; are jointly 
sufficient for 9. It is desired to derive a 90% confidence interval for @ based on the 
statistic S = X,.,. The CDF of S is 


~ pa _— §)/92 
G(s; 0) = e exp[—n(s — 6)/8°] ‘ : 


One possible choice of functions h,(6) and h,(6) that satisfy equation (11.4.1) is 
obtained by solving 


G(h,(8); 8) = 0.05 
and 
G(h2(6); @) = 0.95 
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Functions fh, (@) and 4,(8) for the general method of constructing confidence 
intervals 


Ss 


This yields the functions 
h,(0) = 6 — In (0.95)6?/n = @ + 0.051367/n 


and : — 
h3(0) = 6 — In (0.05)62/n = 6 +.2.996 62/n ‘ 
The graphs of h,(0) and h,(0) with n = 10 are shown in Figure 11.1. 

Suppose now that a sample of size n = 10 yields a minimum observation s = 
X4:10 = 2.50. The solutions of 2.50 = h,(@) and 2.50 = h,(8) are 0, = 2.469 and 
6, = 1.667. Because h,(@) and h,(@) are increasing, the set of all @ > 0 such that 
h,(@) < 2.50 < h,(6) is the interval (1.667, 2.469), which is a 90% confidence inter- 
val for @. 


Because confidence limits in this approach are values of @ that satisfy h,(@) = s 
and h.(6) = s, a more suggestive notation might be 6, and 0, rather than ¢(x) 
and (x). 

In general, if h,(@) and h,(6) are both increasing, then the endpoints of the 
confidence interval can be determined for any observed s by solving for the lower 
limit 6, such that h,(0,) = s, and for the upper limit 0, such that h,(0y) = s. The 
argument that (0,, Oy) is a L00(1 — «)% confidence interval is illustrated graphi- 
cally by Figure 11.2. d 

If 0) is the true value of 6, then P[h,(0)) < S < h,(6,)] = 1 — a, and whenever 
h,(89) < s < h2(8,), then (8; , @y) contains 6,. Also, when s falls outside the inter- 
val (h,(@), h2(@)), the resulting limits will not contain @,, and the associated 
probability is aw. If h,(@) and h(@) are both decreasing, then the argument is 
similar, but in this case h,(0,) = s and h,(@y) =s. These results can be conve- 
niently formulated in terms of the CDF of S. 
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FIGURE 17.2 A confidence interval based on the general method 


Theorem 17.4.7 Let the statistic S be continuous with CDF G(s; 6), and suppose that h,(6) and 
h,(6) are functions that satisfy 


G(h,(8); 0) = a, or (17.4.2) 
and 
G(4,(6); #)=1—a, (11.4.3) 


for cach 6. Q, where 0 <a, <1 and 0<.a, <.1..Let s be an observed value of 
_ §. If h,(@) and h,(6) are increasing functions of @, then the following statements 
hold: 


1.. A one-sided lower 100(1 — «,)% confidence limit, 0,,, is the solution of 
h,(0,) = s (11.4.4) 

2.. A one-sided upper 100(1 — «,)% confidence limit, 6,, is the solution of 
h,(0y) = s (11.4.5) 


3. a=, +a, and 0<a< 1, then (6,, Oy) is a 100(1 — a)% confidence 
interval for 0. : Ps] 


The theorem is modified easily for the case where h,(6) and h,(@) are decreas- 
ing. In particular, if h,(@,) = s, then @, is a one-sided lower 100(1 — «,)% con- 
fidence limit, and if h,(0,) =s, then @, is a ‘one-sided upper 100(1 — «2)% 
confidence limit. 
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Example 17.4.2 Consider again a random sample from an exponential distribution, X; ~ EXP(6), 

and recail that S = )° X; is sufficient for 9. Because 28/0 ~ y?(2n), we have 

a= P[S <h,(8)] 

= P[28/8 < 2h,(8)/8] 

which implies 

2h,(0)/8 = yz (2n) 
and . 

h, (8) = Oxe(2n)/2 


This is an increasing function of 6, and the solution of h,(@) = s provides a one- 
sided upper 100(1 — «)% confidence limit 6, = 2s/y72(2n), which we also obtained 
by the pivotal quantity approach. The function h,(6) and 6, are obtained in a 
L similar manner. 


There exist examples where h,(6) and h,(@) are not monotonic, and where the 
resulting confidence region is not an interval (see Exercise 25). 
Also note that in practice it is not necessary to know h,(@) and h,(6) for all 0, 
but it will be necessary to know the values of @ for which h,(6) = s and h,(0) = s, 
and whether these functions are increasing or decreasing, in order to know which 
function gives an upper limit and which gives a lower limit. It can be shown that 
if G(s; 6) is a decreasing function of @ for each fixed s, then both h,(®) and h,(@) 

are increasing functions of 6. This suggests the following theorem. 


Theorem 77.4.2 Suppose that the statistic S is continuous with CDF G(s; 6), and let s be an 
observed value of S. If G(s; @) is a decreasing function of 6, then the following 
statements hold: 


1. A one-sided lower 100(1 — a2)% confidence limit, @,, is provided by a 
solution of 


G(s; 8;) =1-— a, \ (11.4.6) 

2. A one-sided upper 100(1 — 4) % confidence limit, 0,, is provided by a 
solution 

G(s; Oy) = a (11.4.7) 


3. Ifa=a, +a, and 0<« <1, then (6,, Ay) is a 100(1 — «)% confidence 
interval for 0. 
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A similar theorem can be stated for the case where G(s; @) is increasing in 0. In 
particular, if G(s; 0y) = 1 — 2, then Oy is a one-sided upper 100(1 — «)% con- 
fidence limit; and if G(s; @,)=«,, then 0, is a one-sided lower 100(1 — a,)% 
confidence limit. 


| Example 11.4.3 Consider the statistic S = X,,, of Example 11.4.1. For any fixed s, with the sub- 
stitution t = 5/0, G(s; 6) can. be written as 1 — exp [—(n/s\t\- 1)t] for t > 1. The 
derivative with respect to t is (n/x)(2t — 1) exp [—(n/s)(t — 1)t], which is positive 
because 2t — 1 > 0 when t > 1. Consequently, G(s; @) is an increasing function of 
t and thus a decreasing function of 8 = s/t. It follows from the theorem that 
one-sided lower. and upper. 95% confidence limits for observed 
s = 2.50 and n= 10 are obtained by solving G(2.50; 0,) = 0.95 and G(2.50; @y) 
= 0.05. These solutions are 6, = 1.667 and 6y = 2.469, as before. 


It:also is possible to.state-a more general theorem that includes discrete cases, 
but it-is not always. possible to achieve a prescribed confidence level when the 
observed - statistic is discrete.. However,. “conservative” confidence intervals, in 
general, can be.obtained. 


Definition 77.4.7 


An observed ‘confidence interval (0,, 9y) is called a conservative 100(1 — «)% con- 


fidence interval for 6 if the corresponding random interval contains the true value of 


6 with probability at least 1 — a. = 


Conservative one-sided confidence limits can be defined similarly. 


Theorem 11.4.3 Let S be a statistic with CDF G(s; 6), and let h,(@) and h,(6) be functions that 
satisfy G(h,(6); 6) = a, and 
P[S <h,(6); 0] =1—a, (11.4.8) 
where 0 <a, <1 and 0O<a, <1. 
1. If h,(@) and h,(@) are increasing functions, then a conservative one-sided 
lower 100(1 — «,)% confidence limit for 6, based on an observed value s 
of S, is a solution of h,(6,) ='s, or 6 = 6, such that 
PIS <s;0,)=1— a, (11.4.9) 


A conservative one-sided upper 100(1 — «,)% confidence limit is a solu- 
tion of h,(@y) = s, or G(s; Oy) = a. 
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2. If h,(6) and h2(@) are decreasing functions, then a conservative one-sided 
lower 100(1 —«a,)% confidence limit is a solution of h,(0,) =s, or 
G(s; 6,) = 4,. A conservative one-sided upper 100(1 — a,)% confidence 
limit is a solution of h2(6y) = s, or 6 = @y such that 


PIS <s; @J=1-—«, \ (11.4.10) 


3. In either case, if «=a, +a, and 0<«a <1, then (6,, 6,) isa conserva- 
tive 100(1 — «)% confidence interval for 0. eS 


An exact prescribed confidence level may not be achievable if S is discrete, but 
the confidence levels will be at least the stated levels, This requires keeping the 
strict inequality in conditions (11.4.8), (11.4.9), and (11.4.10). Of course, if S is 
continuous, then PLS < s; 0] = G(s; 6), and the previous theorems apply, yielding 
exact confidence levels. 

Consider the case of a discrete distribution, G(s; #), where h(6) is an increasing 
function and G(s; 6) is a decreasing function of 9. Let S assume the discrete values 
S,, 82, ..., and suppose that there are parameter values @,, 6, ... such that 
G(s;; 0;) = «. If S = s, is observed, then let 6, = 6; be the upper 100(1 — «)% con- 
fidence limit. The confidence level will be greater than 1 — « for the intermediate 
values of @. If 6;_,; <@<6;, then the confidence interval will contain 6 if the 
observed value of S is greater than or equal to s;, which will occur with probabil- 
ity - 

PLS >s,|0,-1 <0 <6] =1—Gs,_,; 9) v 
2 1— G(s,-1; 8;-1) 


=l—«@ 


Similarly, suppose that G(s,; 6) = 1—., and suppose 6, = 6,_,. That is, if 
S=s;, then 6; is the solution of G(s;_~,; 6;)=1—«. Now consider a value 
6;-1 <@<6;. This value will be in the confidence interval if S <s,;, which 
occurs with probability 


P[S < 5;]0,-, <6 < 6] = Gs;,; 6) 
2 G(s;; 0) =1—«4 


Example 17.4.4 In Example 11.3.5, two approaches were presented for obtaining approximate 
confidence intervals for the binomial parameter p, based on large sample approx- 
imations. 

We now desire to derive a conservative one-sided (1 — «)100% confidence limit 
for p. We know that S = )’ X; is a sufficient statistic, and S ~ BIN(n, p). We will 
not find explicit expressions for h,(p) and h,(p) in this example, but note that 


-————- 
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G(s; p) = B(s; n, p) is a decreasing function of p. Thus, for an observed value s, a 
solution of 


a, = Bis; n, py) ="), ("Jove = Py)" ” 


y=0 


is a conservative one-sided upper limit, and a solution of 


sui 
i —a, = Bis—1;n, py) = » ("Jou =p? 
y=0 y, 
is a conservative one-sided lower limit. If «= a@,+a,, then a conservative 
100{1 — «)% confidence interval is given by (p, ; Py): 

For a specified value of s, p, and py can be determined by interpolation from a 
cumulative binomial table such as Table 1 (Appendix C), or it can be obtained by 
numerical methods. applied to the CDF. For example, suppose . that 
n= 10 and s=2. If «, = 0.05, then B(2; 10, 0.5) = 0.0547 and B(2; 10, 0.55) 
= 0.0274. Linear interpolation yields py = 0.509. By-trying a few more values, 
we obtain a closer value, B(2; 10, 0.507) = 0.05, so that py = 0.507. Similarly, if «, 
= 0.05, we can find B(1; 10, 0.037) = 0.95 and thus p, = 0.037. It follows also 
that (0.037, 0.507) is a conservative 90% confidence interval for p. 


Recall that methods for deriving approximate. confidence intervals for the mean, 
ph, of a Poisson distribution were discussed in. Example 11.3.6. For a random 
sample of size n, X; ~ POI(w), a sufficient statistic is S = » X;, and S ~ POI(ny). 
Because the CDF of the Poisson distribution is related to the CDF of a chi- 
square distribution (see Exercise 21, Chapter 8), the confidence limits can be 
expressed conveniently in terms of chi-square percentiles. If we denote by 
H(y; v) the CDF of a chi-square variable with y degrees-of-freedom, then a con- 
servative upper 100(1 — «,)% confidence limit for yu, for an observed value s, is a 
solution of 

a, = G(s; My) = 1 — H(2npy; 2s + 2) 
which means that 

Qnty = 13-228 + 2) 
and thus 

My = X3-a,(25 + 2)/2n 
Similarly, a conservative lower 100(1 — «,)% confidence limit for yw is a solution 
of 

l—a, = Gs — 1; w,) = 1 — H(2np,; 2s) 
so that 


2npy, = x2,(28) 
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and 

M1 = Xq,(25)/2n 
If «, =a, = 4/2, then a conservative 100(1 — «)% confidence interval for u is 
given by 

(xzj2(2 DI x,)/2n, xt -aa(2 Dx; + 2)/2n) (14.4.1) 


— 
The general method can be applied to any problem that contains a single 


unknown parameter. The following theorem may be helpful in identifying a sta- 
tistic that can be used in the presence of location-scale nuisance parameters. 


Theorem 17.4.4 Let X,,..., X, be a random sample of size n from a distribution with pdf of the 
form 
1 —9@ 
(x5 0, 02, ©) =— fol2—; (11.4.12) 


where —co < 6, < co and 6, >.0, and Solz ;«) is a pdf that depends on x-but not 
on 6, or 8. If there exist MLEs 6,, 6,, and x, then the distributions of 
(8, — 6;)/6,, 6,/0,, and & do not depend on 6, and @,. 


It follows that the general method can be used with the statistic & to determine 
confidence limits on x, with 6, and 6, being unknown nuisance parameters. Of 
course, if x is known, then the pivotal quantities (6, — 0,)/6, and 6,/0, can be 
used to find confidence intervals for 0, and 6,. Theorem 11.3.2 also would apply 
in. this. situation, because 6,-and 63. are location-scale parameters when k is 
known. It may not be clear how to derive confidence limits for 6, and 0, if x is 
unknown. 


Yheorem 77.4.5 Let X,,..., X, be a random sample of size n from a distribution with CDF 
F(x; 6,, 92) where 6, and @, are location-scale parameters, and suppose that 
MLEs 6, and 6, exist. If t is a fixed value, then F(t; 6,, 6,) is a statistic whose 
distribution depends on ¢, 6,, and 0, only through F(t; 6,, 0). Ba 


Consider the case where F(x; 0,, 02) = Fo[(x — 9,)/8,] with Fo(z) a one-to-one 
function. Let : 
c = (t — 0,)/8, = Fo *[F(t; 4,, 92)] 
which depends on ¢, ;,. and 0, only through F(t; 0;, 0). It follows that 
F(t; 8,, 82) = Fol(t — 6,)/42] 
= Fo[c(62/82) — (6; — 9,)/62] 
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which is a function only of c and the pivotal quantities (6, — 0,)/6, and 6,/0.. 
Consequently, its distribution depends on F(t; 0,, 02), but not on any other 
unknown nuisance parameters. 


Consider the quantity 
RQ) = P[X >t] =1— F(t; 6,, 62) 


This is an important quantity in applications where X represents a failure time 
or lifetime of some experimental unit. In some applications it is called the 
“reliability” function, and in others it is called the “survivor” function. It follows 
from the previous theorem that the distribution of the MLE of reliability, R(t) 
= 1— F(t; 6,, 8,), depends only on R(t). Thus, the general method can be used 
to find confidence limits on R(t). 

For a more specific example, consider a random sample of size n from a two- 
parameter exponential distribution, X;~ EXP(6, n). The MLEs are 7 = X,.,, 
6=X —X,,,,and 


RW =1— FO; 6, #) = exp [—(¢ — 9)/8] = exp [(6/6) In R(t) — (@ — 0/0] 


If ¥Y = Riz), then it can be shown that the CDF, G(y; R(d), is decreasing in R(t) 
for each fixed y..Thus, by Theorem 11.4.2, a one-sided lower (1 — «100% con- 
fidence limit, R,(t), is obtained by solving G(R(t); R,{t)) = 1 — «. The CDF of R(t) 
is rather complicated in this case, and we will not attempt to.derive it. 


TWO-SAMPLE PROBLEMS 


Quite often random samples are taken for the purpose of comparing two or more 
populations. One may. be interested in comparing the mean yields of two pro- 
cesses or the relative variation in yields of two processes, Confidence intervals are 
quite informative in making such comparisons. 


TWO-SAMPLE NORMAL PROCEDURES 


Consider independent random samples. of sizes n, and n, from two normally 
distributed populations X; ~ N(u,, oj) and Y,; ~ N(#2, 03), respectively. Denote 
by X, Y, S?, and S? the sample means and sample variances. 

Suppose we wish to know whether one population has a smaller variance than 
the other. For example, two methods of producing baseballs might be compared 
to see which method produces baseballs with a smaller variation in their elas- 
ticity. The unbiased point estimators S? and S} can be computed, but if there is 
only a small difference in the estimates, then it may not be clear whether this 
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difference results from a true difference in the variances or whether the difference 
results from random sampling error. In other words, if the distribution variances 
are the same, then the difference in sample variances, based on two more samples, 
might be just as likely to be in the other direction. Note also that if the sample 
sizes are large, then even a small difference between estimates may indicatea 
difference between parameter values; but if the sample sizes are small, then fairly 
large differences might result from chance. The confidence interval approach 
incorporates this kind of information into the interval estimate. 


PROCEDURE FOR VARIANCES 


A confidence interval for the ratio o3/a? can be derived by using Snedecor’s F 
distribution as suggested in Example 8.4.1. In particular, we know that 

‘Sieh 

sw F(ny— 1,0, -1 11.5. 

S20? (a » M2 ) (11.5.1) 
which provides a pivotal quantity for o3/c?. Percentiles for the F distribution 
with vy, =n,—1 and vy, =n, —1 can be obtained from Table 7 (Appendix C), 
so 


Si 93 
P Fars, V2) < s2 ~z < Si-aj215 ¥) = 1 ek (41.5.2) 
2.9% 


and thus, if s? and s3 are estimates, then (1 — «100% confidence interval for 
o3/a? is given by 


53 33 
2 San, — 1, nz — 1), ee Si -aj2lt — 1, m2 — 1) (11.5.3) 
i 


Random samples of size n, = 16 and n, =21 yield estimates s? = 0.60 and 
s3 = 0.20, and a 90% confidence interval is desired. From Table 7 (Appendix C), 
fo.9s(15, 20) = 2.20 and fo .95(15, 20) = 1/f9.95(20, 15) = 1/2.33 = 0.429. It follows 
that (0.143, 0.733) is a 90% confidence interval for o2/c?. Because the interval 
does not contain the value 1, we might conclude that o2 # 0? (or a3/a? # 1), and 
that the two populations have different variances. Because the confidence level is 
90%, only 10% of such conclusions, on the average, will be incorrect. 


This type of reasoning will be developed more formally in the next chapter. 


PROCEDURE FOR MEANS 


If the variances, 0? and o3, are known, then a pivotal quantity for the difference, 
Hz — Hy; is easily obtained. Specifically, because 


Y —X ~ Nu. — Wy, 02/ny + 632/n)) (11.5.4) 
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it follows that 
Z= Y — X — (uw. — 4) 
a/ oj/ny + o3/Nnz 


With this choice of Z, the statement P[ —z,-4)2 < Z <2j-.)2] = 1—a can be 
solved easily to obtain 100(1 — «)% confidence limits, namely 


V— XA 2-2 /o7/ny + 93/n, (11.5.6) 


In most cases the variances will not be known, but in some cases it will be 
reasonable to assume that the variances are unknown but equal. For example, 
one may wish to study the effect on mean yields when an additive or other modi- 
fication is introduced in ari existing method. In some cases it might be reasonable 
to assume that the additive could affect the mean yield but would not affect the 
variation in the process. If 07 = 03 = 07, then the common variance can be elimi- 
nated in much the same way as in the one-sample case using a Student’s ¢ vari- 
able. A pooled estimator of the common variance is the weighted average 


(ny — 1)St + (m2 — 183 


~ NQ, 1) (11.5.5) 


s2 = a; (11.5.7) 
and if 

yt ae 2)S2 (11.5.8) 
then 

ae (ny a . (nz aoe mick te =D) (11.5.9) 


oC o 


It is also true that X and Y are independent of S? and $3, so with Z as given by 
equation (11.5.5), with of = 03 =o”, and with V given by equation (11.5.8), it 
follows from Theorem 8.4.1 that 


_Y¥-XK-() wy) | Z 


T= ————————— ~ (ny +n, —- 2) (1.5.10) 
1 1 JV /(ny +n, — 2) 
Se ee 
ny Ny 
Limits for a (1 — «)100% confidence interval for uw. — py are given by 
¥— XA ty aj2(My + m2 — 2)s, ae (11.6.1) 


Random samples of size n, = 16 and n, = 21 yield estimates x = 4.31, y = 5.22, 
s? = 0.12, and s3 =0.10. We might first consider a confidence interval for the 
ratio of variances to check the assumption of equal variances. A 90% confidence 
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interval for o{/a3 is (0.358, 1.83), which contains the value 1. Thus, there is not 
strong evidence that the variances are unequal, and we will assume o? = o3. The 
pooled estimate of variance is Se = [(15)(0.12) + (20)(0.10)]/35 = 0.109, and 
5, = 0.330. Suppose that a 95% confidence interval for 4, —y, is desired. By 
linear“interpolation between tg 975(30) = 2.042 and to.975(40) = 2.021 in Table 6 
(Appendix C), we obtain tp 575(35) = 2.032. The desired confidence interval, based 
on the limits in equation (11.5.11), is (0.688, 1.133). 


APPROXIMATE METHODS 


It is not easy to eliminate unknown variances to obtain a pivotal quantity for 
H2 — #, when the variances are unequal. One possible approach would be a 
large-sample method. Specifically, as n, and n, > 00, 


Y-X-— (42 — 4) 
Vv Si/ny + S3/n, 


Thus, for large sample sizes, approximate confidence limits for u, — 4, may be 
easily obtained from expression (11.5.12). 

Note that the above limiting results also hold if the samples are not from 
normal distributions, so this provides a general large-sample result from differ- 
ences of means. The size of the samples required to make the limiting approx- 
imation close would depend somewhat on the form of the densities. 

For small samples from normal distributions, the distribution of the random 
variable in expression (11.5.12) depends on oj and 3, but good small-sample 
approximations can be based on Student’s t distribution. One such approx- 
imation, which comes from Welch (1949), is 


7 ak =e - aD) 
VJ Si/n, + S3/n, 


where the degrees of freedom are estimated as follows: 


d 
+Z~N(0, 1) (11.8.12) 


~ iv) (11.5.13) 


(st/ny + 83/n)? 


” [(s?/m2/my — D+ LS3/nPAn, — DI 


Notice that this generally will produce noninteger degrees of freedom, but linear 
interpolation in Table 6 (Appendix C) can be used to obtain the required percen- 
tiles for constructing confidence intervals. The general problem of making infer- 
ences about yw, — “4, with unequal variances is known as the Behrens-Fisher 
problem. Welch’s solution is just one of many that have been proposed in the 
statistical literature; it was studied by Wang (1971), who found it to be quite 
good. 


(11.5.14) 
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wa 


PAIRED-SAMPLE PROCEDURE 


All of the above results assume that the random samples :are independent. In 
some cases, such as test-retest experiments, dependent samples are appropriate. 
For.example, to measure the effectiveness of a diet plan, we would select n people 
at random.and weigh them both before and after the diet. The observations 
would be independent between pairs, but the observations within a pair would 
not be independent because they were taken on the same individual. 

We have a random sample of n pairs, (X;; ¥), and. we assume that the 
differences :D; = Y,— X; for i=1, ..., n are normally distributed with mean 
Hp = My — Hy and variance of = 07 + 63 — 20,3, 0r 


D;,~ No — 44, a5) 


Let 
D=YVD/n=Y-X (11.5.15) 
i=1 
and 
n n 2 
nv -( $0) 
2 i=1 i=1 
= 11.5.7 
Sp n(n — 1) “) 
It follows from.the results-of Chapter 6 that 
D 
_ Dae =m) | t(n — 1) (11.5.17) 


Sp//n 


and thus a (1 — «100% confidence interval for u, — wu; has limits of the form 


d+ ty -a2(n — Vsp//n (11.68.18) 


where d and sp are observed. 

Note that this method remains valid if the saianlds are independent, because in 
this case D; ~ N(u2, — 41, 07 + 03). However, the degrees of freedom in the 
paired sample procedure is n — 1, whereas in the independent sample case with 
a7 = 03 we obtained a t statistic with 2n — 2 degrees of freedom; so the effective 
sample size is twice as large in the independent sample case, and consequently the 
paired-sample method would not be as good. However, if there is a reason for 
pairing and the pairs are highly correlated, then oj = 0] + 03 — 20,2 may be 
much smaller than of + 03, and this could offset the loss in effective sample size. 
Thus pairing is a useful technique, but it should not be used indiscriminately. 

It is interesting to note that.if two independent samples have equal sample size, 
and if the variances are not equal, then the paired sample procedure still can be 
used to provide an exact t statistic, but the. resulting confidence interval would 
tend to be wider than one based:on.an approximate t variable such as that of 
expression (11.5.13). 
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TWO-SAMPLE BINOMIAL PROCEDURE 
Suppose that X,~ BIN(,, p,) and X,~ BIN(n:, p,). Letting py = X,/n, 


and p, = X,/n,, from the results of Chapter 7 we have 


Sate Baer Bap (Pose Pus hg (924) (1.8.19) 


/ Pil — Bi)/m, + Bol — B2)/n2 


It is clear that approximate large-sample confidence limits for p, — p, can be 
obtained in a manner similar to the one-sample case, namely 


p(i—p) . Pal —P 
a eee eb, ete (14.5.20) 
ny Ny 
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BAYESIAN INTERVAL ESTIMATION 


Bayes estimators were discussed briefly in Chapter 9 for the case of point estima- 
tion. There the parameter was treated mathematically as a random variable. In 
certain cases, this may be a physically meaningful assumption. For example, the 
parameter may behave as a variable over different conditions in the experiment. 
The prior density, p(6), may be considered to reflect prior knowledge or belief 
about the true value of the parameter, and the Bayesian structure provides a 
convenient framework for using. this prior belief to order the risk functions and 
select the best (smallest average risk) estimator. In this case, the prior density is 
not unlike a class of confidence intervals indexed by «. As « varies from 0 to 1, 
the resulting confidence intervals for 6 could be represented as producing a prob- 
ability distribution for 8. The induced distribution in this case is based on sample 
data rather than on subjective criteria. 

In any event, suppose that a prior density p(6) exists or is introduced into the 
problem and f(x; 6) is interpreted as a conditional pdf, f(x | @). Consider again the 
posterior density of 6 given the sample x = (x,,..-, X,), 


f (1, Boe) %n| pl) (11.6.1) 
F(%1) +++ Xn] Dp(9) 40 


Joix(8) = 


The prior density p(6) can be interpreted as specifying an initial probability dis- 
tribution for the possible values of 0, and in this context f,),(9) would represent a 
revised distribution adjusted by. the observed random sample. For a particular 
1 — « level, a Bayesian confidence interval for 0 is given by (6,, 6y) where 0; and 
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Oy satisfy 


ou 
| foj(9) dd =1—a _ (11.6.2) - 
OL 


If 6 is a true random variable, then the Bayesian interval would have the usual 
probability interpretation. Of course, in any such problem the results are correct 
only to the extent that the assumed models are correct. If p(6) represents a degree 
of belief about the values of 0, then presumably the interval (6,, 6,) also would 
be interpreted in the degree-of-belief sense, 


In Example 9.5.5 it was assumed that X; ~ POI(0) and that d ~ GAM@, x). The 
posterior distribution was found to be given by 


0, = O|x ~ GAM[(n + 1/f)7*, ¥ x; + x] (11.6.3) 
Tt follows that 
a = n+ 1/60, ~ 7 12AY x + 0)) (11.6.4) 
(n + 1/8) 
and 
PLxzalv) < 2a + 1/P)O. < x7-a2()] =1—« (11.6.5) 


where v = 2(). x; + x). Thus, a 100(1 — «)% Bayesian confidence interval for @ is 
given by (0, Oy) where 0, = x7j2(v)/2(n + 1/f), and Oy = Xf -a/2(v)/2(n + 1/f). 


SUMMARY 


Our purpose in this chapter was to introduce the concept of an interval estimate 
or confidence interval. A point estimator in itself does not provide direct informa- 
tion about accuracy. An interval estimator gives one possible solution to this 
problem. The concept involves an interval whose endpoints are statistics that 
include the true value of the parameter between them with high probability. This 
probability corresponds to the confidence level of the interval estimator. Ordi- 
narily, the term confidence interval (or interval estimate) refers to the observed 
interval that is computed from data. ° 

There are two basic methods for constructing confidence intervals. One 
method, which is especially useful in certain applications where unknown nui- 
sance parameters are present, involves the notion of a pivotal quantity. This 
amounts to finding a random variable that is a function of the observed random 
variables and the parameter of interest, but not of any other unknown param- 
eters. It also is required that the distribution of the pivotal quantity be free of any 
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unknown parameters. In the case of location-scale parameters, pivotal quantities 


can be expressed in terms of the MLEs if they exist. Approximate large-sample 
pivotal quantities can be based on asymptotic normal results in some cases. 

The other method, which is referred to as the general method, does not require 
the existence of a pivotal quantity, but has the disadvantage that it cannot be 
used when a nuisance parameter is present. This method can be applied with any 
Statistic whose distributions can be expressed in terms of the parameter. The 
percentiles are functions of the parameter, and the limits of the confidence inter- 
val are obtained by solving equations that involve certain percentiles and the 
observed value of the statistic. Interval estimates obtained by either method can 


_be interpreted in terms of the relative frequency with which the true value of the 


parameter will be included in the interval, which corresponds to the probability 
that the interval estimator will contain the true value. Another type of interval is 
based on the Bayesian approach. This approach provides a convenient way to 
use information or, in some cases, subjective judgment about the unknown 
parameter, although the relative frequency interpretation may be inappropriate 
in some instances. 


EXERCISES 


Consider a random sample of size n from a normal distribution, X; ~ N(w, 7). 

(a) If it is known that o? = 9, find a 90% confidence interval for 4 based on the 
estimate xX = 19.3 with n = 16. 

(b) Based on the information in (a), find a one-sided lower 90% confidence limit for y. 
Also, find a one-sided upper 90% confidence limit for y. 

(c) For a confidence interval of the form given by expression (11.2.7), derive a formula 
for the sample size required to obtain an interval of specified length A. If o? = 9, 
then what sample size is needed to achieve a 90% confidence interval of length 2? 

(d) Suppose now that o? is unknown. Find a 90% confidence interval for y if 
% = 19.3 and s? = 10.24 withn = 16. 

(ec) Based on the data in (d), find a 99% confidence interval for 07. 


2. Assume that the weight data of Exercise 24, Chapter 4, are observed values of a random 


sample of size n = 60 from a normal distribution. 
(a) Find a 99% confidence interval for the mean weight of major league baseballs. 
(b) Finda 99% confidence interval for the standard deviation. 


3. Let X,,..., X, be a random sample from an exponential distribution, X ~ EXP(@). 


(a) If x = 17.9 with n = SO, then find a one-sided lower 95% confidence limit for 0. 


(b) Find a one-sided lower 95% confidence limit for P(X > t) = e~‘/° where t is an 
arbitrary known value. 
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The following data are times (in hours) between failures of air conditioning equipment in 
a particular airplane: 74, 57, 48, 29, 502, 12, 70, 21, 29, 386, 59, 27, 153, 26, 326. Assume 
that the data are observed values of a random sample from an exponential distribution, 
X, ~ EXP(6). 
(a) Find. a 90% confidence interval for the mean time between failures, 0. 
(b) Find a one-sided lower 95% confidence limit for the 10th percentile of the 
distribution of time between failures. 


Consider a random sample of size n from an exponential distribution, X; ~ EXP(1, 7). 

(a). Show. that Q = X,,, — y.is a pivotal quantity and find its distribution. 

(b) Derive a 100y% equal tailed confidence interval for.y. 

(c) The following data are mileages for 19 military personnel carriers that failed in 
service: 162, 200, 271, 320, 393, 508, 539, 629, 706, 777, 884, 1008, 1101, 1182, 1463, 
1603, 1984,;2355, 2880. Assuming that these data are observations of a random 
sample from an exponential distribution, find a 90% confidence interval for y. 
Assume that 6: =.850.is known. 


Let X,,..., X, be a random sample from.a two-parameter.exponential distribution, 
X; ~ EXP(6, 7). 
(a) Assuming it is known that 7 =.150, find:a pivotal quantity for the parameter 6 
based on the sufficient statistic. 
(b). Using the data of Exercise ’5, find a one-sided lower 95% confidence limit for 6. 


Let X,,X2,...,X, be a random sample from a Weibull distribution, ¥ ~ WEI(@, 2). 
(a) Show that Q = 2 ¥° X7/6? ~ x7(2n). 
i=1 
(b) Use Q to derive an equal tailed 100y% confidence interval for 8. 
(c) Find a lower 100y% confidence limit for P(X > t) = exp [—(t/6)"1. 


(d) Find an upper 100y% confidence limit for the pth percentile of the distribution. 


Consider a random sample of size n from a uniform distribution, X; ~ UNIF(0, 6), 6 > 0, 
and let X,,,, be the largest order statistic. 

(a) Find the probability that the random interval (X,,.,,,2X,,.,) contains 6. 

(b) Find the constant ¢ such that (x,,,,, CX,;,):is a 100(1 — «)% confidence interval for @. 


Use the approach of Example 11.3.4 with the data of Example 4.6.2 to find a 95% 
confidence interval for x. 


Suppose that the exact values of the data x,,..., X59 in Exercise 3(b) are not known, but it 
is known that 40 of the 50 measurements are larger than t. 
(a). Find an approximate one-sided lower 95% confidence limit for P(X > t) based on 
this information. 
(b) Note that under the exponential assumption, P(X > t) = exp (—t/6). If t = 5, use 
the result from (a) to find an approximate one-sided lower 95% confidence limit for 
@ and compare this to the confidence limit of Exercise 3(a). 
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Let p be the proportion of people in the United States with red hair. In a sample of size 40, 
five people with red\yair were observed. Find an approximate 90% confidence interval 
for p. 


Suppose that 45 workers in a textile mill are selected at random in a study of accident rate. 
The number of accidents per worker is assumed to be Poisson distributed with mean y. 
The average number of accidents per worker is X =.1.7. 
(a) Find an approximate one-sided lower 90% confidence limit for u using equation 
(11.3.20). 
(b) Repeat (a) using equation (11.3.21) instead. 
(c) Find a conservative one-sided lower 90% confidence limit for y using the approach 
of Example 11.4.5. 


Consider a random sample of size n from a gamma distribution, X,; ~ GAM(8, x). 
(a): Assuming «is known, derive.a 100(1 —.«)% equal-tailed confidence interval for 8 
based on the.sufficient statistic. 
(b) Assuming that 6 = 1, and for‘n = 1, find an equal-tailed 90% confidence interval for 
x if x, = 10 is observed. Hint: Note that 2X, ~ y7(2k), and use interpolation in 
Table 4 (Appendix C). ; 


Assume that the number of defects in'a piece of wire that is. t-yards in length is 
X ~ POI(Ad for any t > 0. 
(a) If five defects are found in a 100-yard roll of wire, find a conservative one-sided 
upper 95% confidence limit for the mean number of defects in such a roll. 
(b) Ifa total of 15 defects are found in five 100-yard rolls of wire, find a conservative 
one-sided upper 95% confidence limit for A. 


Let X,,...,X, be a random sample from a Weibull distribution, X; ~ WEI(6, 8), where PB 
is known. 
(a) Use the general method of Section 11.4 to derive a 100(1 — «)% confidence interval 
for 8 based on the statistic Sy = Xj,,. 
(b) Use the general method to find a.(1 — a) 100% confidence interval for 0 based on 
the statistic S, =) X?. 


Let f(x; p) = pfy,(x) + (1 — p) fx,(x), where X,;~ N(1, 1) and X,~ N(0, 1). Based ona 
sample of sizen = 1 from f(x; p), derive a one-sided lower:100y% confidence limit for p. 


Suppose that X ~ GEO(p). 
(a) Derive a conservative one-sided lower 100y% confidence limit for p based on a 
single observation x. 
(b) If x = 5, find a conservative ea aded lower 90% confidence limit for p. 
(c) If X,,..., X, is a random sample from GEO(p), describe the form of a conservative 
one-sided lower 100y% confidence limit for p based on the sufficient statistic. 


Let X,,..., X, be a random sample from a normal distribution, X ~ N(y, o”). If tis a 
fixed real number, find a statistic that is a function of the sufficient statistics and whose 
distribution depends on t, 4, and o? only through F(t; 4, 0?) = P(X < t). 
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Consider independent random samples from two normal distributions, X; ~ N(«,, 77) and 
Y~N(t, o3);i=1,...,,,j=1,...,n,. Assuming that uw, and y, are known, derive a 
100(1 — @)% confidence interval for o3/o} based on sufficient statistics. 


Consider independent random samples from two exponential distributions, X; ~ EXP(6,) 
and Y¥, ~ EXP(62);i=1,...,mpf=4,...5 M2. 

(a) Show that (0,/0,(X/Y).~ F(2n,, 2n.). 

(b) Derive a 100y% confidence interval for 0,/6,. 


Compute a 95% Bayesian confidence interval for based on the results of Exercise 12. Use 
the prior density of Example 11.6.1 with 8 = 1 and x =2. 


Let X,,..., X,, be a random sample from a Bernoulli distribution, X; ~ BIN(1, 6), and 
assume a uniform prior 6 ~ UNIF(0, 1). Derive a 100(1 — «)% Bayesian interval estimate 
of 6. Hint: The posterior distribution is given in Example 9.5.4. 


Using the densities f(x| 6) and (6) of Exercise 44 of Chapter 9: 
(a) Derive a 100(1 — @)% Bayesian confidence interval for 0. 
(b). Derive a .100(1 —.2)% Bayesian confidence interval for x = 1/0. 
(c) Compuie.a 90% Bayesian confidence interval for 6. ifn =.10, x = 5, and f = 2. 


Suppose that 6, and 6, are one-sided lower and upper confidence limits for 6 with 
confidence coefficients 1 — a, and 1-— «,, respectively. Show that (0, , 0.) is a conservative 
100(1 — «)% confidence interval for 6 if ¢ = a; +a, < 1. Hint: Use Bonferroni’s inequality 
(1.4.7), with 


[6, < 6 < 6,] =[6, < 6] n [0 < 6y] 
Consider 4 random sample of sizen from a distribution with CDF 


1 —exp [—@&x — 6)] x20 


rex: =f x<9 


with 6 > 0. 

(a) Find the CDF, G(s; 0), of S= X,,,. 

(b). Find the function h(6).such that.G(h(@);.6) =.1—, and show that it is not 
monotonic. 

(c) Show that h(6) = s has two solutions, 0, = [s —. /s?4+-4(In-a)/n]/2 and 
6, =[s.+./s? +.4(In «)/n]/2, if s? > --4 (In «)/n, and that h(6) > s if 
and only if either 6 < 6, or 6 > 6,. Thus, (0, 0,) U (62, 0) is a 100(1 — «)% 
confidence region for 0, but it is not an interval. 


Consider the equal-tailed confidence interval of equation (11.2.6). 

(a) Use the fact that Q = 2nX/@ ~ y7(2n) to derive a formula for the expected length of 
the corresponding random interval. 

(b) More generally, a 100(1 — «)% confidence interval for 6 has the form 
(2nx/q2, 2nx/q,) where Fo(q,) =o, and Fo(q.)=1—«, witha = a, + a, < i,and 
the expected length is proportional to (1/q, — 1/q,). Note that q, is an implicit 
function of q,, because F9(q2) — Fo(q.) = 1 — a (which is fixed), and consequently 
that dq./dq, = fo(q2). Use this to show that the values of q, and q, that minimize 
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(1/q — 1/q,) must satisfy 47 fo(q1) = 43 fo(q2), which is not the equal-tailed choice 
for a chi-square pdf. 


Consider the equal-tailed confidence interval of equation (11.2.7). More generally, if 
Z= JX — w/c, then a 100(1 — «)% confidence interval for w is given by (x — z, olx/n, 
X— 2, o|./n) where ®(z;) =a, and (z,)=1—a,witha=a, +a, <1. 
{a) Show that the interval of this form with minimum length is given by equation 
(11.2.7). 
(b) In the case when a? is unknown, can a similar claim be made about the expected 
length of the random interval corresponding to equation (11.3.5)? 
(c) Ina manner similar to that of Exercise 26, show that the equal-tailed confidence 
interval given by equation (11.3.6) does not have minimum expected length. 


Based on the pivotal quantities Q,- and Q, given by equations.(11:3.8) and (11.3.9), derive 
one-sided lower 100% confidence limits for the parameters 6, and 6, of the extreme-value 
distribution. Leave the answer in terms of arbitrary percentiles q, and q,. 


As noted in Section 9.4, under certain regularity conditions, the MLEs 6, are 
asymptotically normal, N(@, c7(0)/n), where c?(@)/n is the CRLB. Assuming further that c(@) 
is a continuous function of 6, it follows from Theorem 7.7.2 that c(6,) converges 
stochastically to c(6). 
(a). Using other results from Chapter 7, show that.ifZ, = Jn, — 6)/c(6,), then 
Z,2Z~NO, lasn>o.. 
(b) From (a), show that limits for an approximate large-sample 100(1 — «)% confidence 
interval are given by 8, + 2, 4/2 ¢(6,)/./n. 
(c) Based on the results of Example 9.4.7, derive an approximate 100(1 — a)% 
confidence interval for the parameter x where.X; ~ PAR(i, x). 
{d) Use the data of Example 4.6.2 to find an approximate 95% confidence interval for x, 
and compare this with the confidence interval of Exercise 9. Would you expect a 
close approximation in this example? 


Suppose that 6, is asymptotically normal, N(@, c?(0)/n). It sometimes is desirable to 
consider a function, say g(@), such that the asymptotic variance of g(6,) does not depend on 
8. Such a function is called a variance-stabilizing transformation. If we apply Theorem 7.7.6 
with Y, = 6,,m = 6, and c = c(6), then g(@) would have to satisfy the equation 
[c(@)g’(6)]?: = k, a constant. 
(a) If X,,..., X,, is a random sample from a Poisson distribution, X; ~ POI(), and 
6, = X, show that g(u) = Ju is a variance-stabilizing transformation. 
(b) Derive an approximate, large-sample 100(1 — «)% confidence interval for u based 
on g(X). 
(c). Consider a random sample of size n from EXP(6), and let 6, = X. Find a 
variance-stabilizing transformation and use it to derive an approximate 
large-sample confidence interval for 6. 


22.1 


TESTS OF 
HYPOTHESES 


INTRODUCTION 


In scientific activities, much attention is devoted to answering questions about 
the validity of theories or hypotheses concerning physical phenomena. Is a new 
drug effective? Does a lot of manufactured items contain an excessive number of 
defectives? Is the mean lifetime of a component at least some specified amount? 
Ordinarily, information about such phenomena can be obtained only by per- 
forming experiments whose outcomes have some bearing on the hypotheses of 
interest. The term hypotheses testing will refer to the process of trying to decide 
the truth or falsity of such hypotheses'on the basis of experimental evidence. 

For instance, we may suspect that a certain hypothesis, perhaps an accepted 
theory, is false, and an experiment is conducted. An outcome that is inconsistent 
with the hypothesis will cast doubt on its validity. For example, the hypothesis to 
be tested may. specify that a physical constant has the value 4). In general, 
experimental measurements are subject to random error, and thus any decision 
about the truth or falsity of the hypothesis, based on experimental evidence, also 
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is subject to error. It will not be possible to avoid an occasional decision error, 
but it will be possible to construct tests so that such errors occur infrequently and 
at some prescribed rate. 

A simple example illustrates the concept of hypothesis testing. 


A theory proposes that the yield of a certain chemical reaction is normally dis- 
tributed, X ~ N(u, 16). Past experience indicates that w= 10 if a certain mineral 
is not present, and w = 11 if the mineral is present. Our experiment would be to 
take a random sample of size n. On the basis of that sample, we would try 
to decide which case is true. That is, we wish to test the “null hypothesis” 
Ay: = Uo = 10 against the “alternative hypothesis” H,: y= yw, = 11. 


Definition 12.7.7 


If X ~ f(x; 6), a statistical hypothesis is a statement about the distribution of X. If 
the hypothesis completely specifies f(x; @), then it is referred to as a simple hypothe- 
sis; otherwise it is called composite. 


Quite often the distribution in question -hasa- known :parametric form-with.a 
single unknown parameter 6, and the hypothesis consists of a statement about @. 
In this frarnework, a statistical hypothesis corresponds to a subset of the param- 
eter space, and the objective of a test would be to decide whether the true value 
of the parameter is in the subset. Thus, a null hypothesis would correspond to a 
subset Q, of ©, and the alternative hypothesis would correspond to its com- 
plement, Q — Q,. In the case of simple hypotheses, these sets consist of only one 
element each, Q) = {6)} and Q — Q, = {6,}, where 6) # 4,. 

Most experiments have some goal or research hypothesis that one hopes to 
support with statistical evidence, and this hypothesis should.be taken as the alter- 
native hypothesis. The reason for this will become clear'as we proceed. In our 
example, if we have strong evidence that the mineral is present, then we may wish 
to spend a large amount of money to begin mining operations, so we associate 
this case with the alternative hypothesis. We now must consider sample data, and 
decide on the basis of the data whether we have sufficient statistical evidence to 
reject Hy in favor of the alternative H,, or whether we do not have sufficient 
evidence. That is, our philosophy will be to divide the sample space into two 
regions, the “critical region” or “rejection region” C, and the nonrejection region 
S—C. If the observed sample data fall in C, then we will reject H,, and if they 
do not fall in C, then we will not reject Hy. 
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Definition 712.7 2 


The critical region for.a test of hypotheses is the subset of the sample space that 
corresponds to rejecting the null hypothesis. 


In our example, X is a sufficient statistic for p, so we may conveniently express 
the critical region directly in terms of the univariate variable X, and we will refer 
to X as the test statistic. Because #, > “9, a natural form for the critical region in 
this problem is to let C = {(x,... , x,)|x > c}, for some appropriate constant c. 
That: is, we will reject H, if x 2c, and we will not-reject Hg if x <c. There are 
two possible errors we may make under this procedure. We might reject Hy when 
Hg is true, or we might fail to reject Hy when Ho is false. These errors are referred 
to.as follows: 


1. Type Lerror: Reject a true Hg. 
2. Type II error: Fail to reject a false Hy. 


Occasionally, for convenience, we may. refer to the Type If error as “accepting a 
false H,” and to S — C as the “acceptance” region, but it should be understood 
that this is not strictly a correct interpretation. That is, failure to have enough 
statistical evidence to reject Hy is not the same as having strong evidence to 
support Hy. 

We hope to choose a test statistic and a critical region so that we would have a 
small probability of making these two errors. We will adopt the following nota- 
tions for these error probabilities: 


1. P[Type I error] = P[TI] =a. 
2. P[Type II error] = P[TH] = £. 


Definition 12.7.3 


For a simple null hypothesis, Hy, the probability of rejecting a true Hy, « = P[TT], 
is referred to as the significance level of the test. For a composite null hypothesis, 


Ho, the size of the test (or size of the critical region) is the maximum probability of 
rejecting Hy when H, is true (maximized over the values of the parameter under 
H). : 


Notice that for a simple H, the significance level is also the size of the test. 

The standard approach is to specify or select some acceptable level of error 
such asa=0.05 or «= 0.01 for the significance level of the test, and then to 
determine a critical region that will achieve this «. Among all critical regions of 
size « we would select the one that has the smallest P[TII]. In our example, if 
n = 25, then « = 0.05 gives c = Wy + Z,-,0//n = 10 + 1.645(4)/5 = 11.316. This 
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is verified easily, because 
X—Mo  e- cafe | 


X> =p, = 10] = 
P[X > cl = Uo = 10] ‘eerie ? adn 


at 11.316 — 10 
= > —-—_-——— 
E 4/5 | 


= P[Z > 1.645] 
= 0.05 


Thus, a size 0.05 test of Hy : uy = 10 against the alternative H,: uw = 11 is to reject 
H, if the observed value x > 11.316. Note that this critical region provides a size 
0.05 test for any alternative value vw = y,, but the fact that u,; > uo means that we 
will get 'a smaller Type I error by taking the critical region.as the right-hand tail 
of the distribution of X rather than as some other region of size 0.05. For an 
alternative “4, < Mo the left-hand tail would be preferable. Thus, the alternative 
affects our choice for the location of the critical region, but it is otherwise deter- 
mined under H, for specified a. 
The probability of Type IL error for the critical region-C is 


B= PLT] = PX < 11316|4 = 4, = 11] 


X— 11 --11.316 =11 
pe ed 

| 45 2 ays |. : | 
= P[Z <.0.395] = 0.654 


These concepts are illustrated-in Figure 12.1. 

At this point, there is no theoretical reason for choosing a critical region of the 
form C over any other. For example, the critical region C, = {(x,,..., x,)| 10 
< X < 10.1006} also has size « = 0.05 because, under Hy 


oy 0. 1257] 


ne 


P[10 < X < 10.1006] = Af 0 << aulber 


= 0.05 


FIGURE 72.1 Probabilities of Type | and Type II errors 
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However, P[TII] for this critical region is 


P(TI] = 1—P[10 < X < 10.1006|p = 11] 


10-11 10.1006 — 11 
= -7| 4/5 S45 45 | 
=1—P[-1.25<Z< —1.12425] 


= 0.9752 


This critical region is much worse than using the right-hand tail of the distribu- 
tion of X under Hy. : 

In this example there is. a good chance of getting a relatively large x even when 
pu =10, so we must have x > 11.316 to be sure (at the 1 — « = 0.95 level) that we 
are correct when we reject H,. Clearly, we may fail to reject quite often when we 
should reject. This illustrates why we associate the alternative hypothesis with 
what we hope to establish. If we do reject Hy and adopt the alternative hypothe- 
sis, then we have a controlled error rate « of being wrong. If we do not reject Hy, 
then Ho may be true or we may have made a Type I] error, which can happen 
with probability 0.654 in our example. Thus we would not feel secure in advocat- 
ing H, just because we did*not have sufficient evidence to reject it. This is also 
why we prefer to say that we “do not reject Hy)” rather than that we “accept Hy” 
or believe Hy to be true; we.simply would not have sufficient statistical evidence 
to declare H, false at the 0.05 level of significance. Of course, this point is not so 
important if the Type II error also is known to be'small. For example, suppose 
that a sample size n = 100 is available. To maintain a critical region of size 
« = 0.05, we would now use 


Cy = Mo + 24-4 0/,/n = 10 + 1.645(4)/10 = 10.658 
The value of P[TI] in this case is eee 


3 ¥—11 10.658—11 
P[X < 10.658) = 11] = | e | 


“40 ~ 4/10 

= P[Z < —90.855] 

= 0.196 
If the choice of sample size is flexible, then one.can specify both « and f in this 
problem and determine what sample size is necessary. 


More generally, we may wish to-test Hy: “=o against H,:u = yw, (where 
Hy > Mo) at the « significance level. A test based on the test statistic 


Xi 
(12.1.1 
6/x/n 


is equivalent to one using X, so we may conveniently express our test as rejecting 
Ho if 2 2z,-,, where z, is the computed value of Zo. Clearly, under Hy, 


Zo= 


394 


CHAPTER 12 TESTS OF HYPOTHESES 


P( Zo 2 2;-.] = 4%, and we have a critical region of size «. The probability of 
Type II error for the alternative y, is 


P(TI] = |= Sale = a] 
ol./n 
= [ARH ee, = Hs v= a, 
o/,/n of,/n 
so that 
P(TIW = (2. + Ho— fs) (42.1.2) 
a/,/n 
The sample size n that will render PTH] = f is the solution to 
Ho mH 
Zig toe pS OK (12.1.3 
1 ain B 1-p ) 
This gives 
“4 zy ))?0? 
a Mina + 21-5)" (12.1.4) 


(Uo — Hs)? 
For a= 0.05, B= 0.10, = 10, yj = 11; and o=4, we obtain 
n= (1.645 + 1.282)?(16)/1 = 137. 

In considering ‘the error probabilities of a test, it sometimes is convenient to 
use the “power function” of the test. 


Definition 12.7.4 


The power function, x(0), of a test of Ho is the probability of rejecting Hy when the 
true value of the parameter is 6. 


For simple hypotheses H):6=6) versus H,:0=6,, we have 1(@,) 
= P[TI] =« and x(@,) = 1— P[TII] =1—. For composite hypotheses, say 
Hy: 6 € Qo versus H, : @ € Q — Qo, the size of the test (or critical region) is 


a = max n(0) (12.1.5) 
GENo 
and if the true value 6 falls in  —Q), then 7(6) = 1 — P[TII], where we note 
that P[TII] depends on 6. In other words, the value of the power function is 
always the area under the pdf of the test statistic and over the critical region, 
giving P[TI] for values of 6 in the null hypothesis and 1 — P[TI] for values of 0 
in the alternative hypothesis. This is illustrated for a test of means in Figure 12.2. 
In the next section, tests concerning the mean of a. normal distribution will 
illustrate further the notation of composite hypotheses. 


12.2 COMPOSITE HYPOTHESES 395 


FIGURE 12.2 The relationship of the power function to the probability of a Type II error 


1 au.) =6 


au) =1 —p 


H(t) = 


>»! 


12.2 


COMPOSITE HYPOTHESES 


Again, we assume that X ~ N(u, 07), where o? is known, and we wish to test 
Ho: = My against the composite alternative H,: u > uo: It was suggested in the 
previous example that the critical region should be located on the right-hand tail 
for-any alternative yw, >-u 9; but the value of the:critical value c did not depend 
onthe value of w,. Thus, itis clear that the test for the simple alternative also is 
valid for this composite alternative. A test at significance level « still would reject 
Hy if 

Bee aes (12.2.1) 

ol./n 


The power of this test at any value wu is 
X — Mo | 
wv 


nu) = P| o/,/n 2 24-4 
xX—u Ho — 
Tape tint AoE 


Ho — HB 
mu) = 1 — Of 2z,-,4+ (12.2.2 
(H) ( 1 =) ) 


For ff = Uo we have (uo) = a, and for uw > Uy we have n(y) = 1 — P[TH]. 

We also may consider a composite null hypothesis. Suppose that we wish to 
test Ho: u <u against H,: > po, and we reject Hy if inequality (12.2.1) is 
satisfied. This is still a size « test for the composite null hypothesis. The probabil- 
ity of rejecting Hy for any uw < py is n(u), and xu) < n(uo) = & for uw < po, and 
thus the size is max x(u) = a. That is, if the critical region is chosen to have size « 

BHO 


so that 
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Comparison of power functions for tests based on two different sample sizes 


a(u) = P[reject H,| yp] 
(n= 20) 


Po 


at Mo, then the Type I error will be less than « for any 4 <9, so the original 
critical region still is appropriate here. Thus, the « level tests developed for simple 
null hypotheses often are applicable to the more realistic composite hypotheses, 
and P[TI] will be no worse than a. The general shape of the power function is 
given.in Figure.12.3 forn.=.20.and 100. 

From Figure 12.3, it is rather obvious why failure to reject H, should not be 
interpreted as acceptance of Ho. In particular, we always:can find an alternative 
value of y sufficiently close to 4 so that the power of the test, x(u), is arbitrarily 
close to «. This would not be a serious problem in particular, if we could deter- 
mine an indifference region, which is a subset of the alternative on which we are 
willing to tolerate low power. In other words, it might not be too important to 
detect alternative values of w that are close to wy. In our example, we may not be 
too concerned about rejecting Hy when yp is in some small interval, say (uo, 143), 
which we could take as our indifference region. When u > y,, a sample size can 
be determined from equation (12.1.4) that will provide power n(u) > 1 — B. That 
is, for alternative values outside of the indifference region, a test can be con- 
structed that will achieve or exceed prescribed error rates for both types of error. 


P-VALUE 


There is not always general agreement about how small « should be for rejection 
of Hy to constitute strong evidence in support of H,. Experimenter 1 may con- 
sider « = 0.05 sufficiently small, while experimenter 2 insists on using « = 0.01. 
Thus, it would be possible for experimenter 1 to reject when experimenter 2 fails 
to reject, based on the same data. If the experimenters agree to use the same test 
statistic, then this problem may be overcome by reporting the results of the 
experiment in terms of the observed size or p-value of the test, which is defined as 
the smallest size « at which Hg can be rejected, based on the observed value of 
the test statistic. 


Example 12.2.7 
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On the basis of a sample of size »=25 from a normal distribution, 
X; ~ N(u, 16), we wish to test Hy: = 10 versus H,: > 10. Suppose that we 
observe X=11.40. The p-value is -P[X > 1140|y = 10] = 1 — (1.75) 
= | — 0.9599 = 0.0401. Because 0.01 < 0.0401 < 0.05, the test would reject at the 
0.05 level but not at the 0.01 level. If the p-value is reported, then interested 
readers can apply their own criteria. 


To test Hy: > Mp against H,:u < yo, similar results are obtained by reject- 
ing Hy if 
Zo S 21-4 (12.2.3) 


That: is, the critical region of size « now is taken on the left-hand tail of the 
distribution of the test statistic. These tests are known as one-sided tests of 
hypotheses. The test with a critical region of form (12.2.1) is called an upper 
one-sided test, and the form (12.2.3) corresponds to a lower one-sided test. 

Another common type of test involves a two-sided alternative. We may wish to 
test Hy: = Mo against the alternative H,: u 4 uo. If we choose the right-hand 
tail for our critical region, then we will have good power for rejecting Hy when 
> Uo, but we will have poor power when yu < uy. Similarly, if we choose the 
left-hand tail for the critical region, we will have good power if u < yy, but poor 
power if “> u,. A good compromise is to use a two-sided critical region and 
reject Hy if 

Zo S —21-4/2 or Zo 2 21-42 (12.2.4) 


It is reasonable to use an equal-tailed test {each tail of size «/2) in this case 
because of symmetry considerations, and it is common practice to use equal tails, 
for the sake of convenience, in most two-sided tests. 

The power function for the two-sided test is 


nu) = 1 — PE Zi Lan <: ZX 21-4214] 


which gives 


Ho — Ht Uo — 
7 =1-o(2 -ajat )+o(-z sigs ) (12.2.5) 
(4) 1—a/2 ajn 1-a/2 ain : 


If “> Mo, then, as suggested in Figure 12.4, the last normal probability term in 
the above power function wiil be near zero. Similarly, if u <u, then one would 
not expect to reject Hy by getting a significantly large x, and the first normal 
probability term would be near zero. If the appropriate small term is ignored, 
then the sample size formula for the two-sided test is approximately given by 
equation (12.1.4) with z,_, replaced by z,_,,. and w, replaced by y. 

At this point, it is convenient to observe a connection between confidence 
intervals and hypothesis testing. In the two-sided test above, let us determine, for 
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Critical region for a two-sided test 


a/2 a/2 


Hoy ~ Zy_an0/Vn Ho Ho fy tZ_4n9/VN 


an observed value x, which hypothesized values of 9 would not have been reject- 
ed. We see from expression (12.2.4) that these are the values of yz, that satisfy 


Xm 24 a2 o/./n <Ho <X + 21-42 of,/n (12.2.6) 


That is, the set of values of yo that are in the “acceptance region” (nonrejection 
region) of the test is the same as our earlier 100(1 — «)% confidence interval for yu. 
The values of y in the “acceptance regions” of the one-sided tests correspond to 
the one-sided confidence intervals we discussed earlier. Indeed, one could carry 
out a test of Hy : “= fy by computing the corresponding confidence interval and 
rejecting Ho if the interval does not contain pig. 


TESTS FOR THE NORMAL DISTRIBUTION 


In this section, we will state theorems that summarize the common test pro- 
cedures for the parameters of a normal distribution. In a later section, we will 
show that some of these tests have optimal properties. 


TESTS FOR THE MEAN (co? KNOWN) 


The results discussed in the previous section are summarized in the following 
theorem. ‘ 


Theorem 12.3.1 Suppose that x, ..., x, is an observed random sample from N(, o?), where o? is 


known, and let 


x a 
al (12.3.1) 


0. gla 
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1. A size « testof Hy: < po versus H,: > po is to reject H, if 
Zo 2 2,-,. the power function for this test is 


Ho — Kb 
nu) = 1— af: -s HF) 12.3.2 
1 olx/n ( ) 


2. A size a test of Hyp: > uo versus H,: u < po is to reject Ho if 
Zo < —2Z,-,. The power function for this test is 


n(u) = of aig) ane) (12.3.3) 


3. A size a test of Hy : uz = fy versus H, : u # Mo is to reject Ho if 
29 S —24~-a/2 OF 29 2 21 -a/2- , 

4. The sample size required to achieve.a size « test with power 1 — for an 
alternative value wis given by 


272 
(Gia Z1-s) o 
(12.3.4) 
(Uo — py? 


for a one-sided test, and 


2,2 
Gs-2 + 21-9) O° (12.3.5) 
(Ho — #) 


for a two-sided ‘test: 


TESTS FOR THE MEAN (co? UNKNOWN) 


In most practical applications it is not possible to assume that the variance is 
known. It is clear that the pivotal quantities and other statistics considered in 
developing confidence intervals can be applied to the associated hypothesis- 
testing problems. Later we will discuss general.techniques for deriving statistical 
tests. 

Tests for w with o unknown can be based on Student’s ¢ statistic, which will be 
similar to the tests based on the standard normal test statistic for the case in 
which the variance is known, with o? replaced by the observed sainple var- 
iance s?. 


Theorem 12.3.2 Let x,,...,X, be an observed random sample from N(u, 07), where o is 
unknown, and let 


_ X¥— Ho 


on 


ip (12.3.6) 
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1. A size a test of Hy: < fo versus H, : > Uo is to reject Hy if 
Lo 2 ty (n at 1). 
2. Asize a test of Hy: > uy versus H,: pu < po is to reject Hy if 
to < —ty_,(n — 1). 
3. A size « test of Hy: uw = fy versus H,: uw # Mo is to reject Hy 
if to < —ty-ga(n — 1) or to > ty-aja(n — 1). | 


The power function and the related sample size problem are more complicated 
than in the case where a? is known. 
For part 1 of the theorem, for an alternative u > uo, the power function is 


mu) = Ase Pty ole] 


=7[* Hint, 


S/Jn 01 0N14] 


so that 


my) = Zee 5 2th |, (12.3.7) 


where v=n—1, 5=./nlu—p/o, and Z and—V_ are independent, 
Z~N(, 1) and V =(n — 1)S?/o*? ~ y(v). The random variable in equation 
(12.3.7) is said to have a noncentral ¢ distribution with v degrees of freedom and 
noncentrality parameter 6. It has the usual Student’s ¢t distribution only when 
6 = 0, Otherwise, the distribution is rather hard to evaluate. Tables of noncentral 
i distribution are available, and x(u) can be determined from these for given 
values of 6. Similarly, the sample size required to give a desired power can be 
determined for specified values of 6. This can be quite useful if approximate 
values or previous estimates of o are available. Table 8 (Appendix C) gives the 
sample size required to achieve n(u) = 1.-— fin terms of d=]6 //n = |u—pol/o 
for the tests described in Theorem 12.3.2. 


Example 12.3.1 It is desired to test, at the « = 0.05 level, H,: = 10 versus H,: > 10 fora 
normal distribution, N(w, 07) with o? unknown, and we wish to have power 0.99 
if the true value mw is two standard deviations greater than “ = 10. In other 
words, 0.99 = n(u) when d=|u.—po|/o = 2. From Table 8 (Appendix C), we 
obtain n = 6. 


Tests also can be constructed for other parameters such as the variance. 
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TEST FOR VARIANCE 


It is possible to construct tests of hypotheses such as Hy: o? = 02 versus 
H, : a7 > a@ based on the test statistic 


Vy = (n — 1)S?/02 (12.3.8) 


because Vy) ~ y7(n— 1) when Ho is true. An observed value of the sample 
variance, s’, that is large relative to 02 would support H,. This suggests choosing 
the right-hand tail of the distribution of % as the critical region for such a test. 
This test would be very useful for deciding whether the amount of variability in a 
population is excessive relative to some standard value, o2. Similar remarks 
apply for a composite null hypothesis of the form Hy: a7 < a2. 

In the following theorem, H(c; v) is the CDF of y?(v). 


Theorem 12.3.3 Let x,, ..., x, be an observed random sample from N(u, o7), and let 
Yo = (n — 1)s?/a2 (12.3.9) 
1. A size a test of Hy: 07 < 0§ versus H,: 07 > o? is to reject Hy if 
Vo > Xi -{n — 1). The power function for this test is 
no?) = 1 — H[(o3/o7)y?_.(n — 1); n= 1) (12.3.19) 


2. A size « test of Hy: 0° > of versus H,: 0° < 03 is to reject Hy if 
Vo <y2(n ~— 1). The power function for this test is 


no?) = H[(o2/o7)y2(n — 1); n — 1] (12.3.17) 
3. A size « test of Hy: 07 = 03 versus H,: 07 # 02 is to reject Hy if 
Yo < Xza(n — 1) Of 09 2 Xi-aj2(n — 1). 
Proof 


We will derive the power function for part 1 and leave the other details as an 
exercise. 


no?) = P[Vy > xi_.{n — 1)| 07] 
= Pl(n — 1)S?/0? > (o3/o?)x3-(n — 1)| 07] 
= 1— Al(os/o”)xj_.{n — I); n- 1] 


Notice th: in particular, x(o3) = 1 — H[y?_{n-1); n-1] =1-—(1-0) =a, 
and bec...¢ (07) is increasing, the size of the critical region is a. | 


In practice, it is convenient to use an equal-tailed two-sided test as described in 
part 3, but unequal tail values a, and a, with « = a, + «, may be desirable in 
some situations. 
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It is possible to solve for a sample size to achieve prescribed levels of size and 
power for these tests. For example, for the test of part 1, we may wish to have 
n(o7) = 1 — B for some a? > a2. This requires finding the value n such that 


(od/o})xi_(n — 1) = x(n — 1) (12.3.12) 


This cannot be solved explicitly for n, but an iterative procedure based on the 
percentiles in Table 4 (Appendix C) can be used for specified values of a, 8, and 
3/07. It also would be desirable to have an approximate formula that gives n 
explicitly as a function of these values. Such an approximation can be based on 
the normal approximation y3(v) = v + z/ 2v, which was given in Chapter 8. If 
this approximation is used in both sides of equation (12.3.12), then it is possible 
tu derive the approximation 


_ 2 2 
eee lee 2 ; Cel oa =| (12.3.13) 


We desire to test Hy: 07 < 16 versus H,: 0% > 16 with size « = 0.10 and power 
1—f=0.75 when o? = 32 is true. Based on approximation (12.3.13) with 
Z1~q = 1.282, z, = —0.674, and (¢,/0,)? = 0.5, we obtain n = 15. If we compute 
both sides of equation (12.3.12) for values of v close to v= 15—1= 14, we 
obtain the best agreement when y = 15, which corresponds to n = 16. 


TWO-SAMPLE TESTS 


It is possible to construct tests of hypotheses concerning the variances of two 
normal distributions, such as Hy : ¢3/07 = dg, based on an F statistic. In particu- 
lar, consider the test statistic 

_ Si 
= 32 


where Fy ~ F(n, — 1, nz — 1) if Ho is true. 


Fo do (12.3.14) 


Theorem 12.3.4 Suppose that x,,...,X,, and y,,..., ¥,, are observed values of independent 
random samples from N(u,, 7) and N(z , 03), respectively, and let 
s3 : 
fo = do (12.3.15) 
S2 


1. A size « test of Hy : 03/07 <dy versus H,: 03/07 > dy is to reject Ho if 
fo < Afi -a(nz — 1, ny — 1). arnt 

2. A size o test of Hy : 03/a2 > dy versus H, : 62/0? < dy is to reject Hoif 
So 2f1-2(m1 — 1, n, — 1). 
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3. A size a test of Hy : 03/0} = dy versus H, : 03/0? # do is to reject Hy if 
fo < Aff, -aj2(M2 — Lm — 1) or fo Shi —aj2(a_— 1, 2 — 1). 


If the variances are unknown but equal, then tests of hypotheses concerning 
the means such as Hp: 42 — 4, = dp can be constructed based on the ¢ distribu- 
tion. In particular, let 


y—x-—d 
p= (12.3.16) 
Sp hice ng 


where sj is the pooled estimate defined by equation (11.5.7). 


Theorem 72.3.5 Suppose that.x,,...,x,; and y,,..., y,, are observed values of independent 
random samples from N(z,, 07) and N(u,, 03), respectively, where 0? = 03 = 07. 


1. A size « test of Hy sw. — uy <do versus H,: pu, — pM, > dy is to reject 
Ho if to > ty (ny + nm, — 2). 

2. Asize a test of Hp : uy — fy 2 dy versus H,: , — Ly < dg is to reject 
Ay ifto < —t,_,(n, +n, — 2). 

3. A size o test of Ho: Uz ~ Wy = dy versus H,: uy — Ly # do is to reject 
Ho if to < ~ty aot, + nz — 2) oF to > ty _gj2(m, +n — 2). 


The power functions for these tests involve the noncentral ¢ distribution. It is 
possible to determine equal sample sizes n, = n, =n for a one-sided size @ test 
with power.1 — # by using Table 8 (Appendix C) with d’ =|, —y,|/o. For a 
two-sided test, the size is 2x. Again, it is necessary to have a value to use for o or 
else to express the difference in standard deviations. Of course, the test will have 
the proper size whatever n is used. 

For unequal variances, an approximate test can be constructed based on 
Welch’s approximate t statistic as given by equation (11.5,13). Similarly, tests for 
other cases such as the paired sample case can be set up easily. 


PAIRED-SAMPLE t TEST 


All of the above tests assume that the random samples are independent. As noted 
in Chapter 11, there are situations in which an experiment involves only one set 
of individuals or experimental objects, and two observations are made on each 
individual or object. For example, one possible way to test:the effectiveness of a 
diet plan would be to weigh each one of a set of n individuals both before and 
after the diet period. The result would be paired data (x;, y;),..., (x,, y,) with x; 
and y; the weight measurements, respectively before and after the diet, for the ith 
individual in the study. Of course, one might reasonably expect a dependence 
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between observations within a pair because they are measurements on the same 
individual. 

We will assume that such paired data are observations on a set of independent 
pairs of random variables from a bivariate population, (X, Y) ~ f(x, y), and that 


the differences D= Y—X are normally distributed, D~N(up, 03), with 


Lp = E(Y) — E(X) = py — wy. One possibility that leads to this situation is if the 
pairs are a random sample from a bivariate normal population. That is, if there is 
independence between pairs, and each has the same bivariate normal distribu- 
tion, then the differences are independent normal with mean 4, — y,. Thus, a 
test based on the T-variable of equation (11.5.17) applied to the differences 
d; = y; — x; yields a test of uz. — u, with o3 unknown. 


Denote by (x,, y,),...,(%,, y,) the observed values of n independent pairs of 
random variables, (X,, ¥,),...,(X,, ¥,), and assume that the differences D, 
= ¥,— X; are normally distributed, each with mean uy = “, — yw, and variance 
a3. Let d and s? be the sample mean and sample variance based on the differ- 
ences d; = y,; — x;, fori =.1, ..., n, and let 
d—dy 
to = 
Sa/ s/n 
1. A size a test of Hy : 42 — My, < do. versus H, : “. — 4, > do is to reject 
Ho if ty > t,_,(n — 1). 
2. A size a test of Hy: um, — uw, S dy versus H,: uz — My, < dy is to reject 
Hy if tp < —t,_,(n — 1). 
3. A size a test of Hy! uy — Wy = do versus H,: 1, — uy # dy is to reject 
Ho if either tp < —f,_,,2(n — 1) or to > ty_4j2(n — 1). 


(12.3.17) 


BINOMIAL TESTS 


Theorem 12.4.7 


The techniques used to set confidence intervals for the binomial distribution also 
can be modified to obtain tests of hypotheses. Suppose that X;~ BIN(1, p), 
i=1,...,n. Then tests for p will be based on the sufficient statistic S = 
YX, ~ BIN(, p). 


Let S ~ BIN(n, p). For large n, an approximate size « test of Hy: p < po against 
H,:.D > pois to reject Ho if 


S—NDPo 
aa Se 


no Ipod = Po) 


Theorem 12.4.2 
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An approximate size o test of Hy: p > po against H,: p < pois to reject H olf 
Zo< —Z1-¢ 
An approximate size « test of Hy : p = po against H, : p # po is to reject Ho if 
Zig S21 yg" “OLN 85> Zi ag Ey 
These results follow from the fact that when P = Po, then 


5 eters Do 


</ Mpo(l — Po) 


As in the previous one-sided examples, the probability of rejecting a true Hy will 
be less than « for other values of p in the null hypotheses. 

Exact tests also can be based on S, analogous to the way exact confidence 
intervals were obtained using the general method. These tests also may be conser- 
vative in the sense that the size of the test actually may be less than « for all 
parameter values under H,, because of the discreteness of the random variable. 


45Z~N(0, 1) (12.4.1) 


Suppose that S ~ BIN{(n, p), and B(s; n, p) denotes a binomial CDF. Denote by s 
an observed value of 5. 


1. A conservative size a test of Hy: p < po against H,: p> Po is to reject 
H, if 
1— Bis—1;%; po.) <a 
2. A conservative size « test of Hy : p > po against H,: p < pg is to reject 
H, if 
Bis; n, Po) <a 


3. A conservative two-sided test of Hy : p = po against H,: p # py is to 
reject Ho if 


Bis; n, Po) <a/2 or 1—Bis—1;n, po) <a/2 


The concept of hypothesis testing is well illustrated by this model. If one is 
testing Hy : p > po, then Hy is rejected if the observed s is so small that it would 
have been very unlikely (<«) to have obtained such a small value of s when 
P = Po. Also, it is clear that the indicated critical regions have size <«. 

In case 2, for example, if py happens to be a value such that B(s;n, po) = a, 
then the test will have exact size «; otherwise it is conservative. Note also that 
Bis; n, p), for example, is the p value or observed size of the test in case 2, if one 
desires to use that coricept. 

It also is possible to construct tests of hypotheses about the equality of two 
population proportions. In particular, suppose that X ~ BIN(n,, p,) and Y ~ 
BIN(n2, p2), and that X and Y are independent. The MLEs are p, = X/n, and 
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Pp, = Y/n,. Under Hy: p, = pz, it would seem appropriate to have a pooled esti- 
mate of their common value, say p = (X +. Y)/(n, + n,). It can be shown by the 
methods of Chapter 7 that if p; = p,, then 

Desig a BE 7 NO; 1) (12.4.2) 


| eee | 
pl — a= + --| 


and consequently an approximate size « test of Hy: p,; = p, versus H,: p, # py 
would reject Ho if zg < —2Z,~9/2 OF 29 > 2-2/2 


12.5 


POISSON TESTS 


Tests for the mean of a Poisson distribution can be based on the sufficient sta- 
tistic S = }' X;. In the following theorem, F(x; ) is the CDF of X ~ POI(w). 


Theorem 12.5.7 Let x,,..., x, be an observed random sample from POI(w), and let s = 5° x;. 
1. A conservative size « test of Hy: u < Mo versus H,: 4 > Lo is to reject 
Hy if 1 — F(s — 1; nto) <a. 
2. A conservative size a test of Hy: > Ug versus H,: u < Mg is to reject 
Hy if F(s; nig) < a. 


Using the results in Exercise 18 of Chapter 8, it is possible to give tests in terms 
of chi-square percentiles. In particular, Ho of part 1 is rejected if 2nuo < y2(2s), 
and Hy of part 2 is rejected if 2nuo > 7?_,(2s + 2). A two-sided size « test of 
Ho: =o versus H,: 44 Uy would reject Hy if 2npyo < Xe2(28) or 2npo 
> Xi ~«/2(25 + 2). Again the concept of p-value may be useful in this problem, 
where for an observed s, 1 — F(s — 1; ng) and F(s; nfo) are the p-values for cases 
1 and 2, respectively. 


12.6 
MOST POWERFUL TESTS 


In the previous sections, the terminology of hypothesis testing has been devel- 
oped, and some intuitively appealing tests have been described based on pivotal 
quantities or appropriate sufficient statistics. In most cases these tests are closely 
related to analogous confidence intervals discussed in the previous chapter. The 
tests presented earlier were based on reasonable test statistics, but no rationale 
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: - was provided to suggest that they are best in any sense. If confronted with choos- 
is ing between two or more tests of the same size, it would seem reasonable to select 
i the one with the greatest chance of detecting a true alternative value. In other 
words, the strategy will be to select a test with maximum power for alternative 
values of the parameter. We will approach this problem by considering a method 
: for deriving critical regions corresponding to tests that are most powerful tests of 
| a given size for testing simple hypotheses. 
| Let X;,..., X, have joint pdf f(x,, ..., x, ; 0), and consider a critical region C. 
| The notation for the power function corresponding to C is 


| te (8) = P[(X,,..., X,) € ClO] (12.6.1) 


Definition 12.6.7 


A test of Hy: @ = 8 versus H,: 0 = 0, based on a critical region C* is said to be a 
most powerful test of size « if 


1. te(6o) = a, and 
2. %¢(8,) > m-(6,) for any other critical region C of size « 
[that is, t-(@9) = @]. 


Such a critical region, C*, is called a most powerful critical region of size a. 
The following theorem shows how to derive a most powerful critical region for 
testing simple hypotheses. 


Theorem 12.6.1 Neyman-Pearson Lemma Suppose that X,,..., X,, have joint pdf 
SL (X4, «++ Xj 9). Let 


F(%1, --., Xn 9o) 


M1) 20+) Xq_ 5 99, 9;) = — SY + (12.6.2) 
( zi ° a) f(% 4, 0665 5 94) 
and let C* be the set 

C* = {(X1,..., Xl A(X, -.., X53 Oo, 81) < kK} (12.6.3) 


where k is a constant such that 
PU(Xy, o., KX) E'C* 06] =a (12.6.4) 


Then C* is a most powerful critical region of size a for testirig Ho : @ = 4) versus 
H, Q 6 = 6,. 
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Proof 


For convenience, we will adopt vector notation, X=(X,,...,X,) and — 


x = (x,,..., X,). Also, if A is an n-dimensional event, let 
P[Xe Ale] = | fee a-|-- [son 119 X39) dx, ...dx, (12.6.8) 
A A 
for the continuous case. The. discrete case would be similar, with integrals 


replaced by summations. We also will denote the complement of a set C by C. 
Note that if A is a subset of C*, then 


P[X € A|69] < kP[X € A|9,] (12.6.6) 
because fy f(x; >) < f4 f(x; 0,). Similarly, if A is a subset of C*, then 
P[X € A] 0] > kPLX € A|0,] (12.6.7) 


Notice that for.any critical region.C.we have 

Che (CF nC) vw (Ct AC) and CAC. A C%).u (Ca C%) 
Thus, 

tc(8) = P[LX e C*¥ 0 ClO] 4+ P[XeC* 0 ClO] 
and 

nm (8) = P[IXe C* 1 Cl|0] + P[XeCn C*|6] 
and the difference is 

tc(0) — x(0) = P[X e C* 0 ClO] — P[Xe Cn C*|6] (12.8.8) 
Combining equation (12.6.8) with @ = 6, and inequalities (12.6.6) and (12.6.7), we 
have 

Te ,) — 7(84) > (1/K{PLX € C* A Cl 09] — PLX |e Cn C*|O5]} 
Again, using (12.6.8) with @ = 0, in the right side of this inequality, we obtain 

Tee) — H(94) > (1/K) Lt co(8o) — Mc(8o)] 


If C is a critical region of size a, then m¢.(89) — M(4o) = « — a = 0, and the right 
side of the last inequality is 0, and thus m¢.(0,) > m¢(6,). a 


The general philosophy of the Neyman-Pearson. approach to hypothesis 
testing is to put sample points into the critical region until it reaches size «. To 
maximize power, points should be put into the critical region that are more likely 
under H, than under Hg. In particular, the Neyman-Pearson lemma says that the 
criterion for choosing sample points to be included should be based on the mag- 
nitude of the ratio of the likelihood functions under Hy and H,. 


Example 12.6.7 


Example 12.6.2 
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Consider a random sample of size n from an exponential distribution, 
X; ~ EXP(6). We wish to test H):0= 0, against H,:0=0, where 6, >. 
The Neyman-Pearson lemma says to reject Ho if 

89" exp (=), x/40) — 

8," exp (—)" x,/,) 

where k is such that P[A(X, 6), 6,) < k] =a, under = 6). Now 


PLAX; 89, 91) < k] Oo] = PLY, Xi(1/81 — 1/8) < In ((8o/8,)"k) | Oo] 


A(x; Oo, 94) = 


so that 
PEX € C*| 09] = PLD, X; = ky | 00] (12.6.9) 


where k, = In ((80/6,)"k)/(1/0; — 1/89). Notice that the direction of the inequality 
changed because 1/0, — 1/8) <0 in this case. Thus, a most powerful critical 
region has the form C* = {(x,,..., x,)[}, x; 2 k,}. Notice that under Hy: 0 = 
8), we have 2 }° X,/0) ~ x7(2n), so that k, = 097? _,{2n)/2 would give a critical 
region of size «, and an equivalent test would be to reject Hy if 2) x/O 
> x7_(2n). The original constant k could be computed if desired, but it is not 
necessary in order to perform the test. 

Similarly, if we wish to test Hp: 6 = 0, versus H,: 0 = 6, with 0, < @o, then 
the most powerful test of size o is to reject Hy if 2} x,/@9 < x2(2n). The only 
difference between the two tests comes about:because of the difference in the sign 
of 1/6, — 1/8 in the two cases. In other words, the right side of equation (12.6.9) 
becomes P[)) X;<k,| 09] if 0, <6), which corresponds to 1/6, — 1/@, >. 
Note also that this is the only way in which C* depends on the alternative value 
in this example. That is, the most powerful test of Hp: = 0) versus H,: 6 = 6, 


_is exactly the same, provided that @, and @, are both greater (or both less) than 


8). This makes it possible to extend the concept of most powerful test for a 
simple alternative to a “uniformly most powerful” test for a composite alternative 
such as H,:0> 6). 


This concept is considered more fully in the next section. 


Consider a random sample of size n from a normal distribution with mean zero, 
X, ~ N(0, a”). We wish to test Hy : 07 = 0% versus H,: 0? = 07 with o? > o%.In 
this case 


( i ) cp tly asl 


2no 
ees aa : 


1 n 
a) exp [-> x}/207] 
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Thus, 1 < kis equivalent to 
(I/o? — 1/03) x? < ky 


for a constant k,. Because of > 03, we have 1/0? — 1/02 <0, and the most 
powerful critical region has the form C* = {(x,,..., x,)|¥, x? > k,}. Notice 
also that }’ X?/o§ ~ x(n) under Hy, so that a size @ test would reject Ho if 
¥ x?/o$ > x7_,(n). Note that if ¢? < 02, a most powerful test of size « would 
reject if }° x?/0§ < x2(n). 


The previous examples involve continuous distributions, so that a test with a 
prescribed size « is possible. For discrete distributions it may not be possible to 
achieve an exact size «, but one could choose k to give size at most « and as close 
to a as possible. In this case the Neyman-Pearson test is the most powerful test of 
siZe %c+(O9) = 4, < a, and it would be a conservative test for a prescribed size «. 


We wish to determine the form of the most powerful test of Hy: p = po against 
1, : p = Pi > Po based on the statistic S ~ BIN(n, p). We have 


i 


n 
AS fan 
n Reva 
@t — Pi)" 


{est ~ a By 
Pi(l—po)J ~* 


<k 


so that 


or 


1 — p,). 
s In Ren <In k, 
Pi(l — po) : 
Because po(1 — p,)/p,(1 — po) < 1, the log term is negative and the test is to reject 
H, if s > k,, which is the same form as the binomial test suggested earlier. Now 


P[S > ilp = po] =1— BUi— 1; n, po) = 


so for integer values i = 1, ..., n, exact most powerful tests of size «, are achieved 
by rejecting Hy if s >i. For other prescribed levels of « the test would be chosen 
to be conservative as discussed earlier. 


The Neyman-Pearson lemma does not claim that the conservative tests would 
be most powerful. Somewhat artificially, one can increase the power of the con- 


- servative test by adding a fraction of a sample point to the critical region in the 
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discrete case so that P[TI] = «. That is, if PIS > 7] <a and P[S > 6] >, then 
one could reject Hg if s > 7 and some fraction of the time when s = 6, depending 
on what fraction .is needed to make the size of the critical region equal «. This is 
referred to as a randomized test, because if s = 6 is observed one could flip a coin 
(appropriately biased) to decide whether to reject. In most cases it seems more 
reasonable to select some exact a; close to the level desired, and then use the 
exact, most powerful test for this «, significance level, rather than randomize on 
an additional discrete point to get a prescribed size «. 

Note that the Neyman-Pearson principle applies to testing any completely 
specified Ho : fo(x; 00) against any completely specified alternative H, : f,(x; 0;). 

In most applications x will result from a random sample from a density with 
possibly different values of a parameter, but x could result from a set of order 
statistics or some other multivariate variable. Also, the densities need not be of 
the same type under Hy and H, as long as they are completely specified, so that 
the statistic can. be computed and the critical region can be determined with size 
a under Ho. 


Example 12.6.4 We have a random sample of size n, and we wish to test Hy: X ~ UNIF(O, 1) 
against H,: X ~ EXP(1). We have 


_ SolX ay +05 Xn) i ae 
OF ecg) 


so we reject Hy if ) x, <k, =In k. The distribution of a sum of uniform vari- 
ables is not easy to express, but the central limit theorem can be used to obtain 
an approximate critical value. We know that if X ~ UNIF(O, 1), then E(X) 
=1/2, Var(X) = 1/12, and 
xX —0.5 
pe eNO 
af 1/(12n) 
Thus, an approximate size « test is to reject Hy if 
Z, = «/12n(x — 0.5) < —2,-, 
The concept of a most powerful test now. will be extended to the case of com- 
posite hypotheses. 


12.7 


UNIFORMLY MOST POWERFUL TESTS 


In the last section we saw that in some cases the same test is most powerful 
against several different alternative values. If a test is most powerful against every 
possible value in a composite alternative, then it will be called a uniformly most 
powerful test. 
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Definition 12.7.7 


Let X,,..., X, have joint pdf f(x,, ..., x, ; 9) for @ € Q, and consider hypotheses of 
the form Hy: 6 ¢€ Q, versus H,:0 € 2 — Qy, where Q, is a subset of Q. A critical 
region C*, and the associated test, are said to be uniformly most powerful (UMP) of 
size « if 

max 7¢.(0) = a : (12.7.1) 


8ENo 


and 


Tic(8) > 1t(8) (12.7.2) 


for all 6 € €&2 — Q, and all critical regions C of size a. 


That is, C* defines a UMP test of size « if it has size «, and if for all parameter 
values in the alternative, it has maximum power relative to all critical regions of . 
size a. : 

A UMP test often exists in the case of a one-sided composite alternative, and a 
possible technique for determining-a UMP test is first to derive the Neyman- 
Pearson test for a particular alternative value and then show that the test does 
not depend on the specific alternative value. 


Consider a random sample of size n from an exponential distribution, 
X; ~ EXP(@). It was found in Example 12.6.1 that the most powerful test of size « 
of Hy): @=@ versus H,:6=6;, when 6, >), is to reject Ho if 2nx/6, 
= 2) x;/@9 > x7 _,(2n). Because this does not depend on the particular value of 
6,, but only on the fact that 6, > 0, it follows that it isa UMP test of Hy: @ = 

@ versus H,:6> 6). Note also that the power function for this test can be 
expressed in terms of a chi-square CDF; H(c; v), with v = 2n. In particular, 


nO) = 1 — H[(@o/)x? _,(2n); 2n] (12.7.3) 


because (4)/6)[2 }° X,/@)] = 2 ¥. X,/0 ~ x7(2n) when 6 is the true value. Because 
n(@) is an increasing function of 6, max x(@) = 2(85) =, and the test is also a 
“ 6€60 
UMP test of size a for the composite hypotheses Hy: 8 < 4, versus H,:0> 6,. 
Similarly, a UMP test of either Hy: @ = 6) or Hy: 09> 6, versus H,:6 <4, 
is to reject H if 2nx/@4 < x2(2n), and ‘the associated power function is 


n(0) = H[(8o/8)x2(2n); 2n] (12.7.4) 


In Example 4.6.3, observed lifetimes of 40 electrical parts were given, and it was 
conjectured that these observations might be exponentially distributed with mean 
lifetime 6 = 100 months. In a particular application, suppose that the parts will 
be unsuitable if the mean is less than 100. We will carry out a size « = 0.05 test of 
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H,:82> 100 versus H,: 6 < 100. The sample mean is % = 93.1, and consequent- 
ly 2nx/@9 = (80)(93.1)/100 = 74.48 > 60.39 = y2 ,,(80), which means that we 
cannot reject Hy at the 0.05 level of significance. Suppose that we wish to know 
P[TH] if, in fact, the mean is 8 = 50 months. According to function (12.7.4), 
7m(50) = H(120.78; 80). Table 5 (Appendix C) is generally useful for determining 
such quantities, although in this case v = 80 exceeds the values given in the table. 
From Table 4 we can determine that the power is between 0.995 and 0.999, 
because 116.32 < 120.78 < 124.84; however, an approximate value can be found 
from the approximation provided with Table 5. Specifically, H(120.28; 80) 
= 0.9978, so P[TII] = 1 — 0.9978 = 0.0022, and Type II error is quite unlikely 
for this alternative. 


Note that for the two-sided composite alternative H,: 0 # @p, it is not possible 
to find.a test that is UMP for every alternative value. For an alternative value 
6, > 69 a right-tail critical region is optimal, but if the true @ is 8, < 0) then the 
right-tail critical region is very poor, and vice versa. As suggested earlier, we 
could compromise in this case and take a two-sided critical region, but it is not 
most powerful for any particular alternative. It is possible to extend the concept 
of unbiasedness to tests of hypotheses, and in the restricted class of unbiased tests 
there may be a UMP unbiased test for a two-sided composite alternative. This 
concept will be discussed briefly later. 

It is easy to see that the other Neyman-Pearson tests illustrated in Examples 
12.6.2 and 12.6.3 also provide UMP tests for the corresponding one-sided com- 
posite alternatives. General results along these lines can be stated for any pdf that 
satisfies a property known as the “monotone likelihood ratio.” 

For the sake of brevity, most of the distributional results in the rest of the 
chapter will be stated in vector notation. For example, if X = (X,,..., X,), then 
X ~ f(x; 6) will mean that X,,..., X, have joint pdf f(x,, ..., x, ; 9). 


Definition 12.7.2 
A joint pdf f(x; 6) is said to have a monotone likelihood ratio (MLR) in the statistic 


T = ¢(X) if for any two values of the parameter, 0, < @,, the ratio f(x; 0,)/f(x; 6,) 
depends on x only through the function ¢(x), and this ratio is a nondecreasing 
function of ¢(x). ‘ 


Notice that the MLR property also will hold for any increasing function of ¢(x). 


Example 12.7.2 Consider a random sample of size n from an exponential distribution, 
X; ~ EXP(8). Because f(x; 0) = (1/0)" exp (—>. x,/0), we have 


flrs 8) do 
Fle; 8,) — Os/0a)" exp EG x z) d «| 
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which is a nondecreasing function of ¢(x) = }. x; if 0. > 6,. Thus, f(x; 6) has the 
MLR property in the statistic T =}. X;. Notice that the MLR property also 
holds for the statistic X, because it is an increasing function of T. 


The MLR property is useful in deriving UMP tests. 


If a joint pdf f(x; 6) has a monotone likelihood ratio in the statistic T = ¢(X), 
then a UMP test of size « for Hy:0< 6) versus H,:6> @p is to reject Ho if 
¢(x) > k, where P[¢(X) > k| 09] =a. 


The dual problem of testing Hj): 920) versus H,:0< 0p also can be 
handled by the MLR approach, but the inequalities in Theorem 12.7.1 should be 
reversed. Also, if the ratio is a nonincreasing function of ¢(x), then H, of the 
theorem can be rejected with the inequalities in ¢(x) reversed. 

In many applications, the terms nondecreasing and nonincreasing can be 
replaced with the terms increasing and decreasing respectively, but not in all 
applications, as the next example demonstrates. 


Consider a random sample of size.n from.a two-parameter exponential distribu- 
tion, X, ~ EXP(, y).. The joint-pdf is 


oe {- -»¥ (x; base n)| H < Xin 


Xin S 4 


f (x; 9) = 


Ifg, <nz; then 


TAS; 121° 11 <X1im S42 ‘ 
I (x3 11) exp [n(y2 — 1)] H2 < Xin 


That this function is not defined for x,,,<.q,.is not a problem, because 
PX 1, <1] =0 when 7, is the true value of 7. Thus, the ratio is a nonde- 
creasing function of x,,,, and the MLR property holds for T = X,.,. According 
to Theorem 12.7.1, a UMF test of size a for Hy: < yo versus H,:4 > No is to 
reject Hy if x,., > SE ibile a = P[X,.,, > k|no] = exp [—n(k — no)], and thus 
k = 9 — (In a)/n. 


Theorem 12.7.2 Suppose that X,,..., X, have joint pdf of the form 


F(x; 8) = c(8)h(x) exp [4(9)¢(x)] (12.7.8) 


where q(@) is an increasing function of 6. ‘ 


1. A UMP test of size « for Hy : 8 < 8) versus H,: 6 > 4p is to reject H, if 
¢(x) > k, where P[¢(X) > k| 89] = «. ) 


Example 72.7.4 
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2. A UMP test of size « for Hy :@6 > 6) versus H,: 6 < @p is to reject H, if 
&(x) < k, where P[4(X) < k|@)] = a. 


Proof 
If 6, < 62, then q(@,) < q(6,), so that 
fi SF (%; 92) 63) _ (82) 
S(*; 1) — c(4,) 


which is an increasing function of ¢(x) because. q(@2) — q(@,) > 0. The theorem 
follows by the MLR property. 3 


exp {[q(62) — 4(9,)]4(x)} 


An obvious application of the theorem occurs when X,,..., X, is a random 
sample froma member of the regular exponential class, say f(x; §) 
= c(O)h(x) exp [q(4)u(x)] with 4(x) = ¥° u(x,) and q(@) an increasing function of 6. 


Consider a random sample of size n from a Poisson distribution, X;~ ~ POT). 
The joint pdf is 


em Ts 


Se M= all x,=0,1,... 


x, | 
M1 x Tt exp [in ) ¥ x) 


The theorem applies with q(#) =1n w and 4x) =), x,;. A UMP test of size « 
for Ho: u<pmo versus H,: > uo would reject Hy if T= oe X,2k where 


wm 


P[T >k|yo] =a. Because T~ POI(ny), we must have 5) exp (—npy)(nYo)'/ 


t=k 
t! = a. We again have a discreteness problem, but the tests described in Theorem 
12.7.2 are UMP for the particular values of « that can be achieved. 


J 

As mentioned earlier, there is a close relationship between tests and confidence 
intervals. If one tests Hy: @ = 6) against H,:@ #4 0, at the a significance level, 
then for a given sample, the set of @) for which Hg would not be rejected rep- 
resents a 100(1 — a)% confidence region for 8. Loosely speaking, if the acceptance 
set of a size a test is an interval then it is a 100(1 —a«)% confidence interval. 
Thus, one approach to find a confidence interval is first to derive an associated 
test by one of the techniques discussed earlier. Goodness properties of confidence 
intervals usually are expressed in terms of the associated test. For example, a 
confidence region associated with a UMP test is termed uniformly most accurate 
(UMA). 


416 CHAPTER 12 TESTS OF HYPOTHESES 


UNBIASED TESTS 


It was mentioned earlier that in some cases where a UMP test may not exist, 
particularly for a two-sided alternative, there may exist a UMP test among the 
restricted class of “unbiased” tests. 


Definition 72.7.3 
A test of Hyp: 0 € Q, versus H,: @ € Q — Qy is unbiased if 


min 7(@) > max n(@) (12.7.6) 
FEN — Ro OeNo 


In other words, the probability of rejecting Hg when it is false is at least as 
large as the probability of rejecting Hg when it is true. 


Example 12.7.5 Consider'a random sample of size n from a normal distribution with mean zero, 
X; ~ N(O, a7). It is desired to test Hy : 07 = 02 versus a two-sided alternative, 
H,: 0? #46, based on the test statistic Sy = )\ X}?/63. Under Hy we know that 
So x2{h), so an equal-tailed critical region, similar to that of part 3 of Theorem 
12.3.3, would reject Ho if so < x2,.(n) or So > x7 _4/2(n). In particular, consider a 
sample of size n = 2 and a test of size 7 = 0.05 for Hy : o? = 1. The graph of the 
power function for this test is given in Figure 12.5. 


FIGURE 12.5 The power function ofa biased test. 


n(o?) = Plceject Holo’] AO 


The minimum value of x(a”) occurs at a value o? # a2, and thus the test is less 
likely to reject Hy for some values o* # of than it is when o? = o2. Consequent- 
ly, the test is not unbiased. It is possible to construct an unbiased two-sided test if 
we abandon the convenient equal-tailed test in favor of one with a particular 
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choice of critical values 72,(n) and y?_,,(n) with a, +0, =a, but a, # ay (see 
Exercise 26). Such a test is not very convenient to use, and the biased equal-tailed 
test usually is preferred in practice. 


It can be shown. that the test.described above is a. UMP test among the 
restricted class of unbiased tests of H,. In fact, it can be shown, under certain 
conditions, that for joint pdf’s of the form given in equation (12.7.5), a uniformly 
most powerful unbiased (UMPU) test of Hy): = 6 versus H,: 6 # @ exists. 
Methods for deriving UMPU tests are given by Lehmann (1959), but they will 
not be discussed here. 


pe, 


GENERALIZED LIKELIHOOD RATIO TESTS 


The Neyman-Pearson lemma. provides.a method for deriving a most powerful 
test of simple hypotheses, and quite often this test also will be UMP for a one- 
sided composite alternative. Two theorems that were stated in the previous 
section also are useful in deriving UMP. tests-when there is a single unknown 
parameter. 

Methods for deriving tests also are needed when unknown nuisance param- 
eters. are present or. in other situations where the methods for determining a 
UMP. test.do.not appear applicable. For example, in.a two-sample normal 
problem, one may test the hypothesis. that the means are equal, Ho: 4, = fo, 
without specifying them to be equal to.a particular value. We discussed several 
natural tests in the first five sections, which were mostly suggested by analogous 
confidence-interval results. Of course, some of these tests also are UMP tests, but 
it is.clear more. generally that a test can be based on any statistic for which a 
critical region of the. desired size can be determined. The problem then is to 
choose a good test statistic and a reasonable form for the critical region to obtain 
a test with high power. If the distribution depends on a single unknown param- 
eter, then a single sufficient statistic or an MLE.may be available, and a test 
could be based on. this statistic, For a multiparameter problem, a test statistic 
might be some function of joint sufficient statistics or joint MLEs, but it may not 
always be clear what test statistic would be most suitable. Of course, the distribu- 
tion of the statistic must be such that the size of the critical region can be com- 
puted and not depend on unknown parameters. For example, Student’s ¢ statistic 
may be used to test the mean of a normal distribution when the variance is 
unknown. Given a suitable test statistic, then, as. with the Neyman-Pearson 
lemma, sample points should be included in the critical region that are less likely 
to occur under Hy and more likely to occur under H,. 
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The generalized likelihood ratio test is a generalization of the Neyman-Pearson 
test, and it provides a desirable test in many applications. 


/ 


Definition 12.8.7 


Let X= (X,,..., X,) where X,,..., X, have joint pdf f(x; @) for @ € 2, and con- 
sider the hypothesis Hy: 8 € Qo versus H,:6 ¢  — Qo. The generalized likelihood 
ratio (GLR) is defined by 


max f(x; 8) A 

A(x) = 2a ___ _ 08; 8) (12.8.1) 
max f(x;8) f(x; @) 
Oe 


where 6 denotes ihe usual MLE of 8, and 65 denotes the MLE under the restriction 
that H, is true-—~ 


In other words, @ and 6 are obtained by maximizing f(x; @) over Q and Qo, 
respectively. The generalized likelihood ratio test is to reject Hy if A(x) < k, where 
k is chosen to provide a size « test. 

Another, slightly different approach is to maximize over @ ¢ @ — Q, in the 
denominator, but the form in Definition (12.8.1) often is easier to evaluate and 
yields equivalent results. 

Essentially, the GLR principle determines the critical region and associated 
test, by deciding which points will be included according to the ratio of estimated 
likelihoods of the observed data, where the numerator is estimated under the 
restriction that Ho is true. This is similar to the Neyman-Pearson principle where 
the likelihoods are completely specified, but it is not, strictly speaking, a complete 
generalization of the Neyman-Pearson principle, because the unrestricted esti- 
mate, 6, could possibly be in Q,. 

We see that A(X) is a valid test statistic that is not a function of unknown 
parameters; in many cases the distribution of A(X) is free of parameters, and the 
exact critical value k can be determined. In some cases the distribution of A(X) 
under Hy depends on unknown parameters, and an exact size a critical region 
cannot be determined. If regularity conditions hold, which ensure that the MLEs 
are asymptotically normally distributed, then it can be shown that the asymp- 
totic distribution of A(X) is free of parameters, and an approximate size « test 


will be available for large n. In particular, if X ~ f(x; 0,,-.., 0,), then under 
Ho : (04, ..., 9,) = (O10, ---, 9,0), t < k, approximately, for large n, 
—2 In A(X) ~ 7(r) ~  (42.8.2) 


Thus, an approximate size « test is to reject Ho if 


—2 In A(x) > x?_,(r) (12.8.3) 


— 
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Suppose that X; ~ N(u, 0”), where a” is known, and we wish to test Ho : u = Ho 
against H, : u + Mo. The usual (unrestricted) MLE is & = x, and the GLR is 


Xe ! 
1.2 ap 
_ Qn0?)-"? exp [~¥) (% — Ho)"/207] 
~ Qno*)"? exp [—). &% — ¥)?/207] 
which, after some simplification, can be expressed as 
A(x) = exp [—n(% — Ho)*/207] 


Rejecting Hy if A(x) < k is equivalent to rejecting Hp if 


ao Eel a 
ofa 5 
where Z ~ N(O, 1) and Z? ~ (1). Thus, a size « test is to reject Hp if 
2” > x4-(1) 
or equivalently, reject Ho if 
ZS 24 -a2 OF 22 2Z4-a/2 


Thus the likelihood ratio test may be reduced to the usual two-sided equal-tailed 
normal test. It is interesting to note that the asymptotic approximation, 
—2 In A(X) ~ y7(1), is exact in this example. 


We now consider the hypotheses Ho: = uo against H,: > Mo. For practical 
purposes, the GLR test in this case reduces to the one-sided UMP test based on 
z; however, there is one technical difference. In this case, the MLE relative to 
Q = [uo, 0) is 
— X> Uo 
Ho *X<Ho 
Because the size of the test, «, usually is quite small, and we will be rejecting Hy 
for large x, ordinarily we will be concerned only with determining a critical 
region for the GLR statistic for the case when X > fo. 
Specifically, we have 
Weve fox [—n& —ue)"/208] &> 
1 xs 
but under Hy, P[A(X) < 1] = PLX > wo] = 0.5. So, for «< 0.5, k <1, and the 
critical region will not contain any x such that A(x)=1. In particular, the 


420 


—————— 
example 12.8.3 


FIGURE 12.6 


\ 
\ 


CHAPTER 12 TESTS OF HYPOTHESES 


ae 
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GLR test is to. reject Hy if X>po and A(x) <k; or X>py and 
2? = [/n(X — uo)/o]? > ky when X > Mo, z? > ky, if and only if z > \/ky. Thus, a 
size « test (for a < 0.5) is to reject Hy if z > z,_,,, which also is the UMP test for 
these hypotheses. } 

The GLR test of Ho: < uo against H,: > Uo is somewhat similar. In this 
case, fi = X, but maximizing the likelihood function over Q, gives 


P x X< Mo 
Ho = 


Ho X > Uo 
Thus, 
F(3 ) lexp [—n(% — uo)?/207]_ ¥ > Mo 


This is the same result as obtained for testing the simple Ho: u = uy against 
H,: > Uo. The same critical region also gives a size « test for the composite 
null hypothesis Hy: < Wo, because z = (x — Uo(a//n), PIZ >2,_,|Ho]l = % 
and P[(Z > 2;_,|u] <a when <p. 


It sometimes is desired to test a hypothesis about one unknown parameter in 
the presence of another unknown nuisance parameter. 


Suppose now that X; ~ N(u, o”) where a? is assumed unknown, and we wish to 
test Ho: =o against H,:u Apo. This does not represent a simple null 
hypothesis, because the distribution is not specified completely under Hy. 
The parameter space is two-dimensional, 0 =(—«, «) x (0, 0) 
= {(y o?)| -0 <p <0 and o? > 0}, and Ho: “= by is an abbreviated nota- 
tion for Ho: (u,07)€Q4 where Qo = {uo} x (0, 0) = {(u, o7)[ m= ny. and 
oa? > 0}. These sets are illustrated in Figure 12.6. 


A subset Q, of hypothesized values wy 


a 


4 Qo 
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Maximizing f(x; yu, 07) over 9 yields the usual MLEs f=x and 
a? =) (x; — X)?/n, but over Q we obtain flo =o and 63 =} (x; — Mo)?/n. 
Thus, i 


ive S(¥3 Ho» 8) 
S (x; A, 6’) 


__ (2x66) ~" exp [—)! (% = Ho)"/266 
© (2n6?)-"Pexp [—¥ (44 — 37/267] 


= [60/67] "7 
and consequently, 


—2/n _ oo (x; = Mo)” 
[A(x)] 9 >: (x; - x)? 
- % (xj — x)? +n — Ho)” 
y: (x; - x)? 


= 1+ 2(x)/(n — 1) 


where  4(x) = /n(% — W0)//N. (x; — 32/0 — 1) = nF — wos. Under Ho: 
=U, T= 4X) ~ t(n— 1) and T? ~ F(1, n— 1). Thus, rejecting Hy when A(x) 
is small is equivalent to rejecting H, if T? is large, and a size a test is to reject Ho 
if t? > f,_,(1,n — 1), or alternatively if 


t<—h-galt 1). or... bP ty-go(n — 1) 


Thus the two-sided test proposed earlier based on Student’s t pivotal quantity is 
now.seen to be a GLR test. This two-sided test is not UMP, but it can be shown 
to. be UMPU. 


The GLR approach also can be used to derive tests for two-sample problems. 


Example 12.8.4 Suppose that X ~ BIN(n,, p,) and Y ~ BIN(n2, p2) with X and Y independent, 
and we wish to test whether the proportions are equal, H,: p, = p2 = p against 
H,:p, # P2, where p is unknown. The parameter space is Q = (0, 1) x (0, 1) 
= {(D1, P2)|0 < py < 1 and O<p,< 1}, and the subset corresponding to Ho is 
Qo = {(Pi, P2)|0 < pi = P2 < 1}. These sets are illustrated in Figure 12.7. 
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A subset of hypothesized values p, =p. 


P2 


Based on x and y, the. MLEs are p, =x/n,, Bp, =y/n, over Q, and 
Po = (x + y(n, +n.) over 2,. The GLR statistic is 


S(X3 Bo) f (V3 Bo) 
F(x; BF; Ba) 


ny \, ~ \ny~-x{ 72 \~ Pen 
x] nix Y({ — n2-y 
(” pa Bo) « pat Bo) 


ny\, clef Wp Ne ee 
(™ \pic — p,)" (px — p,)-» 


Except for the cancellation of the binomial coefficients, this particular GLR sta- 
tistic does not appear to simplify greatly, but it can be computed easily. The 
distribution of A(X, Y) will depend on p under Hy, however, so an exact size « 
critical region cannot be determined. The chi-square approximation should be 
useful here for large sample sizes. The hypothesis Hy represents a one- 
dimensional restriction in the parameter space, because it corresponds .to 
Hy): @ =p, — py = 9; sor = I in expression (12.8.2), and approximately for a size 
a test, 


A(x, y) re, 


—2 In A(X, ¥) ~ (1) 
and H, is rejected if —2 In A(x, y) > x?_,(1). 


The more common approach in this case is to use the approximate normal test 
statistic given by expression (12.4.2). It also is interesting to note that an exact 
conditional test can be constructed in this case, as discussed in the next section. 
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k-SAMPLE TESTS 


The GLR approach also is helpful in deriving important tests involving samples 
from two or more normal distributions. 

Suppose that independent random samples are obtained from k normal dis- 
tributions. For each j = 1,..., k, we denote the mean and variance of the jth 
distribution by u,; and a}, respectively, and we denote by n, the sample size. Also, 
we denote by x,, the observed value of the ith random variable from the 
jth random sample, X,,;~ N(w;, oj). We will denote the individual jth 
sample means and sample variances, respectively, by xX; =) x,,/n; and s} 


= )) (x;; — X,)?A(n; — 1). It also will be convenient to adopt a notation for the 
mean of the pooled samples, namely x=) >) x,,/N =}' n,x,/N with N 
ji j 


SANs + Ny. 

A test for the equality of distribution means now will be derived, assuming a 
common but unknown variance, say oj = 0”. Specifically, we will derive the GLR 
test of Hy: uw, = ++: = “, versus the alternative that yu, # yu, for at least one pair 
i # j. The likelihood is 


L= L(y, .--, Ue, o)={TT1 = exp | a Ou a 


& 1 
= (2x07)? exp |- er) by y (xi — “| 
i 
which yields the following log-likelihood: 
N 1 
inL=—-> In (2x07) ~ 392 yy (x; — 4)" 
Relative to the parameter space Q = {(u4, ..., Hy, 97)| 00 <p; < «0, 07 > O}, 
we compute the MLEs by taking partials of In L with respect to each of the k + 1 


parameters and equating them to zero 


7) 1 ; 
5, mB b= 57h Oy eM-) = 0 PSA pcgck 


Ou; 
G Node bd , 
ao? In L= — 2 ot eae Cu Hd (—1)=0 


Solving these equations simultaneously we obtain the MLEs 
J 
fy = ¥ xy) = 5, = TT —3)¢/N 
i foot 


For the subspace Q under Hg, the u's have a common but unknown value, 
say 4, and the MLEs are obtained by equating to zero the partials of In L with 


424 


CHAPTER 12 TESTS OF HYPOTHESES 


respect to and o?: 


4 1 
ae 398d 2 ty AM a D0 


7 ssn hs dG agyst 
aa? InL= io, 2 of oye ye & Mu —#) (—1)=90 


leading to the solutions 
Bo =F nj3,/n = = EE (my —37/N 
Thus, the GLR is 
L(t, 63 (2n66)~%!? exp | - 262 a » &y- | 
Me Be) angty-M exp E a aa 3 | 
_ (2263)~%!? exp (—N/2) 
(2n67)-/? exp (—_N/2) 
= (63/67)? 
y 2 Cy | ve 
4 2 » (xij — xj) 


It is possible to express this test in terms of an F-distributed statistic. To show 
this we first note the following identity (see Exercise 40): 


Yd, Oy — 3)? = DY my — &)? +E ly — 3) (12.8.5) 


A(x) = 


(12.8.4) 


From this and equation (12.8.4), it follows that 
d nih — Fa 
Ax)=|1+ eH (x — &) 
which means that the GLR test is equivalent to rejecting H, if for some c > 0 
2 n«x; — x)? 
Ce a 
Next we define variables V, = )) )° (xi; — %)?/0?, Vo =). ¥ (x — X)?/o? and 
a ike 


(12.8.6) 


Jj t 
Vs = )) n(x; — X)?/o”, and note that we can write V, =)" (n; — 1)s?/o?. Thus, Vz 
j i 


Theorem 12.8.7 


42.8 GENERALIZED LIKELIHOOD RATIO. TESTS 425 


is a function of the sample variances and V; is a function of the sample means. 
Because the x,s and s7’s are independent, it follows that V, and V; also are 
independent. Furthermore, from identity (12.8.5) we know that V, = V, + V;, so 
from the results of Chapter 8 about sums of independent chi-square variables, we 
have under Hy that V, ~ ¥?(N — 1) and V, ~ x7(N — k), from which it follows 
that V; ~ y7[(N — 1) —(N —&)] = x7(k — 1). Thus, the ratio on the left of equa- 
tion (12.8.6) is proportional to an F-distributed statistic, namely 


FES Y Gy — 3 — 8 


~ F(k —1,N —k) (12.8.7) 


Finally, we note that the denominator of equation (12.8.7) is a pooled estimate of 
o*, namely s2 = >) (n; — 1)sj/(N — &). 
i 


The above remarks are summarized in the following theorem: 


For k normal populations with common variance, N (u,, o*),j=1,..., ka size x 
test of Hy: Wy =°** = M, is to reject Ho if: 
(x, — X)*/(k —1 
pa Le = ike LSP EENS W 
P 


k 
where s2 = 5° (nj— Usj(N—k) and N= ¥.n,. Furthermore, this test is equiv- 
j=Hi 


alent to the GLR test. 


An important application of this theorem involves testing the effects of & differ- 
ent experimental treatments. For example, suppose it is desired to test whether k 
different brands of plant food are equally effective in promoting growth in garden 
plants. If 4; is the mean growth per plant using brand j, then the test of Theorem 
12.8.1 would be appropriate for testing whether the different brands of plant food 
are equivalent in this respect. 

This test also is related to a procedure called analysis of variance. This termino- 
logy is motivated by the identity (12.8.5). The term on the left reflects total varia- 
bility of the pooled sample data. On the other hand, the first term on the right 
reflects variation “within” the individual samples, while the second term reflects 
variation “between” the samples. Strictly speaking, this corresponds to a one-way 
analysis of variance because it only considers one factor, such as the brand of 
plant food. It also is possible to consider a second factor in the experiment, such 
as the amount of water applied to plants. The appropriate procedure in this case 
is called a two-way analysis of variance, but we will not pursue this point. 

The GLR approach also can be used to derive a test of equality of variances, 
Ho 107 =:+:: = oj. We will not present this derivation, but an equivalent test 
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and an.approximate distribution that is useful in performing such a test are given 
in the following theorems. 


Let 


and let 


ee 

af Ceres 

. twa be ‘| 
k 


where N = }° v, and v,w, ~ x7(v,); then approximately 
jet 


M/e~x7(k—-1)  ifv,>4 


For smaller v, critical values may be determined from Tables 31 and 32 of 
Pearson and Hartley (1958). 

Now for normal samples, (n; — 1)s7/a? ~ 7(n; — 1), and if one lets w, = s?/a? 
in M, then the o? will cancel out if they are all equal; thus M(@ ;— 1s?) may be 
used as a test statistic for testing Hy: 07 =--: = a7, and its distribution may be 
expressed in terms of the chi-squared distribution under..H,. This statistic is 
minimized when all of the observed s} are equal, and the statistic becomes larger 
for unequal s7, which favors the alternative. 


Theorem 12.8.3 For k normal populations N(u;, 07), j= 1,..., k, an approximate size « test of 
Ho:0? =0;=:+:= 7 against the alternative of at least one inequality is to 
reject Hy if 


12.9 


M(n; — 1, sj) > ext-k — 1) 


CONDITIONAL TESTS 


It sometimes is possible to eliminate. unknown nuisance parameters and obtain 
exact ‘size’ a tests by considering tests. based on. conditional variables. For 
example, if a sufficient statistic S exists for an unknown nuisance parameter 6, 
then the distribution of X|S will not depend on @..This technique will be illus- 
trated for the two-sample binomial test. 


-————— 
Example 12.9.7 
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Again let X ~ BIN(n,, p,;) and Y ~ BIN(n;, p2) with X and Y independent. We 
wish a size « test of Hy: py = p2 = p against H,: py < p,. Under Hg, the joint 
density of X and_Y is 


I(x, y) = (™\(™ pr oo p)"! +no—(x+y) 


and it is clear that S = X + Y is sufficient for the common unknown p in this 
density. This suggests considering a test based on the conditional distribution of 
(X, Y) given S =s. Because Y = S — X, it suffices to base the test on the condi- 
tional distribution of Y given S = s. Under Hy, S ~ BIN(n, + nz, p), and thus 


ae Ssh, y) 
FrislY) om f(s) 


Ss Sxy(S—y,Y) 
fs(s) 


ny n2\ nytn2—s 
8 & Mee toe 


é é ae A pyitmns 


= —“— y= 0,...,5;5=0,...,m, +1 


which is a hypergeometric distribution. This distribution does not involve p, and 
an exact size « critical region can be determined under Hy for any given observed 
value of s. For H,: p, < p2 the best critical region would be for large y. Thus, 
reject Hy if y > K(s), or for a size test, reject Ho if 


Tests for other alternatives can be obtained in a similar manner. Except for the 
discreteness problem, this provides an exact size « test. In other words, it is exact 
for values of « that the above sum can attain. Otherwise, the test is conservative 
for the prescribed «. 


The following theorem is useful for constructing tests for hypotheses concern- 
ing a parameter @ in the presence of nuisance parameters, k = (K,, ..., K,,). 
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Theorem 12.9.1 Let X = (X,,..., X,) where Xj, ..., X,, have joint pdf of the form 
f(x; 9, «) = c(O, K)h(x) exp ea +¥ cot) : (12.9.1) 
i=1 


If 5, = 9(X) fori=1,...,m, and T = ¢(X), then S,,..., S,, are jointly sufficient 
for K,,..., K, for each fixed 0, and the conditional pdf f7,,(t; 8) does not depend 
on «. Furthermore, 


1. A size « test of Hy: @< 09 versus H,:0 > @y is to. reject Ho if 
4(x) > k(s) where P[T = k(s)|s] =a when 0= 6). 

2. A size « test of Hy: 8205 versus H,: 0 < Oy is to reject Ho if 
d(x) < k(s) where P[T < k(s)|s] =a when 0 = 4). 


Under certain regularity conditions on equation (12.9.1), it also is possible to 
show that these tests are UMP unbiased ‘tests. For more details, see Lehmann 
(1959). 


Example 12.9.2 Consider a random sample from a gamma distribution, X; ~ GAM(A, x). If we 
reparameterize with 0 = 1/A, then the joint pdf is 
an K-41 ; 
Fes 8, «) = apy (Ped? exp (—8 Y x) 
= (0, x)h(x) exp [A(— ¥. x) + (« — 1) In (] x] 


where A(x)=1 if all x;>0, and 0 otherwise. According to the theorem, 
S = In ([] X)) is sufficient for any fixed 0, and if T= —})) X,, then the distribu- 
tion of T given S = s does not depend on x. The conditional pdf /7),(t) is quite 
complicated in this example, but.tables that can be used to perform an equivalent 
test are given by Engelhardt and Bain (1977). 

Note that a conditional size « test also.is a size a test unconditionally, because, 
for example, if P[T > k(s)|s] = a, then 


P[T > k(S)] = Es{PLT > k(S)| S]} 
= E,(a) 


12.10 


SEQUENTIAL TESTS 


We found earlier that for a fixed sample size n the Neyman-Pearson approach 
could be used to:construct most powerful size « tests for simple hypotheses. Also, 
in some cases, formulas are available for computing the sample size n that yields 


Example 12.10.7 
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a size « test with a specified power, or equivalently, a specified level of Type II 
error, say f. In this section we consider a sequential test procedure in which the 
sample size is not fixed in advance. 


The force (in pounds) that will cause a certain type of ceramic automobile part to 
break is normally distributed with mean y and standard deviation o = 40. The 
original factory parts are known to have a mean breaking strength of 100 
pounds. A manufacturer of replacement parts claims that its parts are better than 
the originals, and that the mean breaking strength of the replacement parts is 120 
pounds. To demonstrate its claim, a test is proposed in which the breaking 
strength is measured for a fixed number n of the replacement parts, yielding data 
X1,+.+,X,- Based on the resulting data, the manufacturer decides to perform a 
size a = 0.05 test of Hy: 4 < 100 versus H,: > 100. We first consider the ques- 
tion.of determining.a fixed sample. size based.on methods discussed earlier in the 
chapter. It follows from Theorem 12.7.1 that a UMP test of size « for Hy: < Mo 
versus H,: 4 > Uy rejects H, if X >c for an appropriate choice of c. Further- 
more, we have from Theorem 12.3.1 that this is equivalent to rejecting Hy if 
Zo = J n(x — Ho/o > z,.,, and that the sample size required to achieve a test 
with a specified 8 = P[Type [I error] is given by n = (z,_, + 2; - a) 07 (Hy — 1). 
Thus, to have a size « = 0.05 test of the replacement parts with f = 0.10 when 
# = 120, a sample of size n = (1.645 + 1.282)?(40)*/(100 — 120)? = 34 is required. 
Obviously, the cost of such a project will depend on the number of parts tested, 
which might lead one to seek a procedure that requires that fewer parts be tested. 
One possibility is to consider a sequential test. In other words, it might be pos- 
sible to devise a procedure such that testing of the first few parts would produce 
sufficient evidence to accept or reject H) without the need for further testing. 


SEQUENTIAL PROBABILITY RATIO. TESTS 


Consider the situation of testing a simple null hypothesis Hy : 0 = 49 against a 
simple alternative hypothesis H,:6 = 0,. If X,,..., X, is a random sample of 
size n from a distribution with pdf f(x; 6), then we know from the Neyman- 
Pearson lemma that a most powerful critical region is determined by the inequal- 
ity 
ee £15 Oo) Fa Po) = 5 
F(% 13 9) ++ $% 5 94) 
where k is a positive constant. 
A sequential probability ratio test (SPRT) is defined in terms of a sequence of 
such ratios. Specifically, we define 


_ £15 Fo) tf O%m 3-9) 
F153 01) FR OD 


Am = Amk%15 es Xm) 
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for m= 1, 2,..., and adopt the following procedure: Let ky < k, be arbitrary 
positive numbers, and compute 4, based on the first observation x,. If A, < ko, 
then reject Hy; if 4, > k,, then accept Hy; and if ky < A, < kj, then take a second 
observation x, and compute A,. Similarly, if 4, < ko, then reject Ho; if 4, > ky, 
then accept Hy; and if kj <A,<-k,, then take a third observation x; and 
compute A;, and so on. The idea is to continue taking x,’s as long as the ratio 4,, 
remains between k, and k,, and to stop as soon as either A,,<ko or A, > ky, 
rejecting Ho if ,,<kp and accepting Ho if /,, > k,. The critical region, say C, of 
the resulting sequential test is the union of the following disjoint sets: 

Cy = Ay ee Xd Ko SA May os MPS Rad = Ay BO 

A(X 15 «+03 Xp) & Ko} 

forn = 1,2, .... 

In other words, if for'some n, a point (x,,...,x,) is in C,, then Hy is rejected 
for a sample of size n. On the other hand, H, is accepted if such a point is in an 
acceptance region, say A, which is the union of disjoint sets A, of the following 
form: 

Ae = {1 st lo SA XP = A a a 4B 
A(X ts ss. 89 Xn) 2 ky} 

In the case of the Neyman-Pearson test for fixed sample size n, the constant k 

was determined so that the size of the test would be some prescribed a. Now it is 


necessary to find constants ky and k, so that the SPRT will have prescribed 
values « and f for the respective probabilities of Type I and Type I error, 


a = P{reject Ho|9]= >, | L,(4o) (12.10.1) 
n=l JCy 
and 
f = Placcept Hy|9,]= > | L(81) (12.10.2) 
n=1 n 


where L,(0) =f(x,; 9)--:f(x,; 6), and the integral notations are defined as 
follows: 


{mien = | [ fees Oo) «°° f(%_ jo) dx, «+ dx, 
and 


[ teo= | | fees 61) eae abe oh) dx, wee dx, 


The constants ky and k, are solutions of the integral equations (12.10.1) and 
(12.10.2), and, as might be expected, an exact determination of these constants is 
not trivial. Fortunately, there is a rather simple approximation available that we 
will consider shortly. 
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Before we proceed, there are a number of points to consider about SPRTs. In 
particular, because the sample size depends on the observed values of the 
sequence of random variables X,, X,,..., it is itself a random variable, say N. As 
one might suspect, the distribution of N is quite complicated, and we will not 
derive it. Another concern is the possibility that testing could continue indefi- 
nitely. Although we will not attempt a proof, it can be shown that a SPRT will 
terminate in a finite number of steps. Specifically, it can be shown that N < co 
with probability 1. Of course, another point that was raised earlier concerns 
whether the size of the sample can be reduced by using a sequential test rather 
than a test with a fixed sample size. We will discuss this latter point after we 
consider the question of approximations for ky and ky. 


APPROXIMATE SEQUENTIAL TESTS 


Suppose it is required to perform a sequential test with prescribed probabilities of 
Type I and Type HW errors, « and £, respectively. As noted above, the constants ko 
and k, can be obtained by solving the integral equations (12.10.1) and (12.10.2), 
and exact-solutions, in general, will be difficult to achieve..F ortunately, it is pos- 
sible-to obtain approximate solutions that are much easier to compute and rather 
accurate. If «and f are the exact levels desired, then we define constants ké 

ot ee i-—e@ 
=, and k¥ = i 

The following discussion suggests using k§ and kj as approximations for ky 
and k,. ven the above stated property that N < oo with probability 1 and that 
A,(X4) «+5 X_) S Ko when (x1, ..., X,) is in C,,, it follows that 


a = P{reject Ho|@o] = s" L,{8o) < y ko L,(6;) 


n=1 JC, n=1°JC, 


= ko » L(1) 
n=1.JdC, 

=k, P[reject H,|6,] 

= ko(1 ~~ B) 


and hence «/(1 — f) < ky. Similarly, because 4,(x,,..., x,) 2k, when (x,,..., x,) 
is in A, , it follows that 


1 ~a = Placcept Hy|@]= > i L,(9.) = ¥ 
n=1 JAn n=1 JAy 
=k,P[accept Ho|6,] 
=k, B 
and hence k,; <(1 — a)/B. These results imply the inequality kt < ky <k, < kf. 
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A relationship now will be established between the errors for the exact test and 
those of the approximate test. Denote by «* and f* the actual error sizes of the 
approximate SPRT based on using the constants kj and k¥. Also, denote by Cy 
and A* the sets that define, respectively, the critical and acceptance regions for 
the approximate test based on k% and kj. It follows by an argument similar to 
that given above that 


-§ [noo< 45 F [neo-7250-09 


n=1 JC, 


and 


= — 4 

1—at= >; L,(8o) = = 2 |. = B B® 
n=1 JAn 

It follows that a*(1 —-f) < a(1 — - and (1 — See (i — a*)$, and consequent- 

ly that «*(i ~ ) + (1 — &)B* < a1 — £*) + (1 — a*)f, which, after simplification 

yields the inequality: 

at + BP  <at+f 

woe if the experimenter uses the approximate SPRT based on the constants 

= a/(1 — f) and k¥ = (1 — «)/B rather than the exact SPRT, then based on the 


Ben ky and -k,, the sum of the errors of the approxima test are bounded 
above by the sum of the error of the exact test. 


EXPECTED SAMPLE SIZE 


We now consider a way of assessing the effectiveness of SPRTs in reducing the 
amount of sampling relative to tests based on fixed sample sizes. Our criterion 
involves the expected number of observations required to reach a decision. 

As before, we denote by N the number of observations required to reach a 
decision, either reject Hg or accept Hy. Theoretically, we might attempt to 
compute its expectation directly from the definition, but as noted previously the 
distribution of N is quite complicated and thus we will resort to a different 
approach. Recall that the test is based on observed values of a sequence of 
random variables X,, X,,..., X, which are independent and identically distrib- 
uted with pdf f(x; 6). Theoretically, we could continue taking observations indefi- 
nitely, but according to the sequential procedure defined above, we will terminate 
as soon as A, <kg or A, > k, for some n, and we define N as the first such value 
n, 

We now define a new random variable, say Z = In f(X; 69) —Inf(X; 6,) 
where X ~f(x;.6) for either 8 = 4 or 9,. In a similar manner, we can define a 
whole sequence of such random variables Z,, Z,,..., based on the sequence 


m 
X,, X,... and we also can define a sequence of sums, S,,= }) Z; for m>1. 
i=1 
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Notice that these sums are related to the likelihood ratios: 
Sn = In [A,(X1,..65 Xml m=1,2,... 


It follows that N is the subscript of the first sum S, such that either S, < In (ko) 
or S, > In (k,), and we denote the corresponding sum as Sy = Z, + -*: + Zy. It 
is possible to show that E(Sy) = E(N)E(Z) when E(N) < oo. This relationship, 
which is known as Wald’s equation, is useful in deriving an approximation to the 
expected sample size. We will not attempt to prove it here. 

If the sequential test rejects Hy at step N, then Sy < In (ko), and we would 
expect the sum to be close to In (ko), because it first dropped below this value at 
the Nth step. Similarly, if the test accepts Hy at step N, then Sy 2 In (k,), and we 
would expect the sum to be close to In (k,) in this case. These remarks together 
with Wald’s equation suggest the following approximation: 


E(Sy) _ In (ko)PL[reject Ho] +.1n (k,)PLaccept Hy] 


EWN) = E(Z)— E(Z) 


By using the approximations ky = k§ =a/(1 — fp) and k, = kf =(1 — a)/B, we 
obtain the following approximation to expected sample size when H, is true: 


al|n [ofl — A] +0 — In (1 — »)/f] 
E(Z | Ao) 


E(N | @0) = 


Similarly, an approximation when H, is true is given by 


(1 — £) In [ol — )] + Bin (1 — /6) 
E(Z|4,) 


E(N|6,) = 


We consider again the problem of Example 12.10.1, which dealt with the force (in 
pounds) required to break a certain type of ceramic part whose breaking strength 
is normally distributed with mean y and standard deviation o = 40. Suppose we 
wish to test the simple null hypothesis Hy : « = 100 versus the simple alternative 
H,: 4 = 120 with « = 0.05 and # = 0.10. Thus, the approximate critical values 
for a SPRT are kx = 0.05/(1 — 0.10) = 0.056 and k? = (1 — 0.05)/0.10 = 9.5, and 
such a test would reject Hy as soon as A, < 0.056, and accept Hy as soon as 
A, 2 9.5. In this case, it also is possible to express the test in terms of the sum of 


: 1 1 
the data. Specifically, because f(x; u) = ae exp | - aa? (x — uw we can 
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write 


2; = In f(%; 3 Ho) — In f(x; ; #4) 


1 1 
= 575 (y — Hy)? = 5:(X) = Uo)? 
20 26 


1 
=3 [2x(Ho — #1) — 5 — v7) 


Ho aH 
= Bx = Ho + Hi) 
It follows that 


n Ho — Lt n 
Sn = 2s oJ aes Be : E y X; — n(Uo +10 | 
i= i=1 


which is a linear function of }' x;. Thus, the criterion of stopping the test if 
5, < In (kg) or s, >In (kf) is equivalent to stopping if }) x; <co(n) or ¥ x; 
> ¢,(n) with co(n) and c,(n) determined by n, k%, k¥, Uo, fy, and a?. 

It also would be interesting to approximate the expected sample size and 
compare this to the sample size required to achieve the corresponding test with a 
fixed sample size. The expression given above for z; also can be used to obtain 
E(Z). Specifically, it follows that 


E(Z) = "S™* [2E(X) — (uo + a) 


and thus, 
BZ|u) =" yy — Wo + w.)) i= 0,1 
As a result, we have that E(Z| u,;) = (— (uy — u,)?/(207) for i = 0, 1, and in our 
example, 
E(N | up = 100) 
, 0.05 In [0.05/(1 — 0,10)] + (1 — 0.05) In [(1 — 0.05)/0.10] 
a (100 — 120)?/[2(40)?] 
= 16 
and 
E(N | py = 120) 


_ (L = 0.10) In [0.05/(1 — 0.10)] + 0.10 in [(1 — 0.05/0.10] 
= —(100 — 120)?/[2(40)7] 


«= 19 
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For the SPRT, the expected sample sizes 16 and 19 under Hy and H,, respec- 
tively, compare to the sample size n = 34 for the corresponding Neyman-Pearson 
test considered in Example 12.10.1. 


For additional reading about sequential tests, the book by Ghosh (1970) is 
recommended. 


SUMMARY 


Our purpose. in this chapter was.to introduce the concept of hypothesis testing, 
which corresponds to.a process.of attempting to determine.the truth or falsity of 
specified statistical hypotheses on the basis of experimental evidence. A statistical 
hypothesis is.a statement about the distribution of the random variable that 
models the characteristic of interest. If the hypothesis completely specifies the 
distribution, then it is called a simple hypothesis; otherwise it is called composite. 
A Type J error occurs when a true null hypothesis is rejected by the test, and a 
Type II error occurs when a test fails to reject a false null hypothesis. It is not 
possible to avoid an occasional decision error, but it'is possible in many situ- 
ations to design tests that lead to such errors with a specified relative frequency. 

If a test is based on a set of data consisting of m measurements, then the critical 
region (or rejection region) is a subset of an.n-dimensional Euclidean space. The 
null hypothesis is rejected when the n-dimensional vector of data is contained in 
the critical region. Often it is possible to-express the critical region in terms of a 
test statistic. For.a simple hypothesis, the significance level is the probability of 
committing a Type [ error. In the case of a composite hypothesis, the size of the 
test (or size of the associated critical region) is the largest probability of a Type I 
error relative to all distributions specified in the null hypothesis. 

The power function gives the probability of rejecting a false null hypothesis for 
the different alternative values. By means of the Neyman-Pearson lemma, it is 
possible to derive a most powerful test of a given size, and in some cases this test 
isa UMP test. 

In many cases, a UMP test cannot be obtained, but it is possible to derive 
reasonable tests by means of the generalized likelihood ratio approach. For 
example, this approach can be used in many cases to derive tests of hypotheses 
where nuisance parameters are present, and also in situations involving two-sided 
alternatives. In a few cases, it is possible to derive UMP unbiased tests that also 
can be used in nuisance parameter problems and with two-sided alternative 
hypotheses. 


436 


CHAPTER 12 TESTS OF HYPOTHESES 


EXERCISES 


Suppose X,, ..., X,¢ is a random sample of size n = 16 from a normal distribution, 
X;, ~ N(u, 1), and we wish to test Hg : u = 20 at significance level « = 0.05, based on the 
sample mean X. 

(a) Determine critical regions of the form A = {x| -c <*X <a} and 
B= {x|b<X < oo}. 

(b) Find the probability of a Type II error, 8 = P[TI], for each critical region in (a) for 
the alternative H, : 4 = 21. Which of these critical regions is unreasonable for this 
alternative? 

(c) Rework (b) for the alternative H,: u = 19. a 

(d) What is the significance level for a test with the critical region A U B? 

(e) What is f = P[TIT] for a test with critical region AU Bif|u— 20|=1? 


Suppose a box contains four marbles, 8 white ones and 4 — @ black ones. Test Hy: @ = 2 
against H,: 8 #2 as follows: Draw two marbles with replacement and reject H, if both 
marbles are the same color; otherwise donot reject. 

(a) Compute the probability of Type I error. 

(b) Compute the probability of Type If error for all possible situations. 

(c). Rework (a) and (b) if the two marbles are drawn without replacement. 


Consider a random sample of size 20 froma normal distribution, Xj ~ N(u, 0), and 
suppose that x = 11.and s?.= 16. 
(a) Assuming it is known that o?.=-4, test Hy: > 412 versus H,:u < 12 at the 
significance level. « =.0.01: 
(b) What is 8 = P[TIQ] if in fact w = 10.5? 
(c) What sample size is needed for the power of the test to be 0.90 for the alternative 
value yp =.10.5? 
(d) Test the hypotheses of (a) assuming o?.unknown. 
no (e) Test Hj: 0? <9 against H,: a? > 9 with significance level « = 0.01. 
Rvar (3) What sample size is needed for the test of (e) to be 90% certain of rejecting Hy) when 
in fact o? = 18? What is 8 = P[TI1] in this case?, 


Consider the biased coin discussed in Example 9.2.5, where the probability of a head, p, is 
known to be 0.20, 0.30, or 0.80. The coin is tossed repeatedly, and we let X be the number 
of tosses required to obtain the first head. To test Hy : p = 0.80, suppose we reject H, if 
X > 3, and do not reject otherwise. 
(a) What is the probability of Type I error, P[TI]? 
(b) What is the probability of a Type I error, P[TII], for each of the other two values 
of p? 
(c) Fora test of Hy : p = 0.30, suppose we use a critical region of the form 
{1, 14, 15, ...}. Find P[TI], and also find P[TI1] for each of the other values of p. 


It is desired to test the hypothesis that the mean melting point of an alloy is 200 degrees . 
Celsius (°C) so that a difference of 20°C is detected with probability 0.95. Assuming 
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normality, and with an initial guess that ¢ = 25°C, how: many specimens of the alloy must 
be tested to perform a t test with « = 0.01? Describe the null and alternative hypotheses, 
and the form of the associated critical region. 


Let X,,..., X, be a.random sample of size from an exponential distribution, 
X,~ EXP(1, 7). A test of Ho: 4 < No versus H,: 1 > Mo is desired, based on X,,,,. 
(a) Find a critical region of size « of the form {x,,, 2 ¢}. 
(b) Derive the power function for the test of (a). 
(c) Derive a formulato determine the sample size n fora test of size ¢ with B = P[TIT] 
ify = 74. 


A coin is tossed 20 times and x = 6 heads are observed. Let p = P(head). A test of 
H):p 20.5 versus H,: p < 0.5 of size at most 0.10 is desired. 
(a) Perform a test using Theorem 12.4.1. 
(b) Perform a test using Theorem 12.4.2. 
(c) What is the power of a size « = 0.0577 test of Hy : p > 0.5 for the alternative 
p=0.2? 
(d) What is the p-value for the test in (b)? That is, what is the observed size? 


Suppose that the number of defects in a piece.of wire of length t yards is Poisson 
distributed, X ~ POI(At), and one defect is found in a 100-yard piece of wire. 
(a) Test’ Ho :A 20.05 against. H,::A <.0.05-with significance level at most 0.01, by 
means of Theorem 12.3.1. 
(b). What is the p-value for such a test? 
(c) Suppose a total of two defects are found in two 100-yard pieces of wire. Test 
Ho :A 2.0.05. versus. H,: A < 0.05 at significance level.2 = 0.0103. 
(d) Find the power of the test in (c) if A = 0.01. 


Consider independent random samples from two-normal distributions, X; ~ N(u,, 0?) 
fori=1,...,m, and Y¥~N(u2, 03) forj=1,..., n,. Let ny =n, =9,x = 16, y = 10, 
s? = 36, and s3 = 45. 
(a) Assuming equal variances, test Hy: w, =, against. H,: uw, # uw, at thea = 0.10 
level of significance. 
(b) Perform an approximate a = 0.10 level test of Hy: 4, =, against H, : uy. uz 
using equation (11.5.13). 
{c) Perform a test of these hypotheses at the « = 0.10 significance level using equation 
(11.5.17), assuming the data were obtained from paired samples with s3, = 81. 
(d) Test Hy : 63/o? <1 versus H, : 3/07 > 1 at the a = 0.05 level. 
(e) Use Table 7 (Appendix C) to find the power of this test if o3/a? = 1.33. 


A certain type of component is manufactured by two different companies, and the 
respective probabilities of a nondefective component are p, and p,.In random samples of 
200 components each, 180 from company.1 are nondefective, and 190 from company 2 are 
nondefective. Test Hy: p, =p, against H,:p, # p, at significance level « = 0.05. 
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Consider a distribution with pdf f(x; 6) = 6x°~* if 0 <x <.1.and zero otherwise. 
(a) Based on a random sample of size n = 1, find the most powerful test of Hp: 0 = 1 
against H,:@ = 2 with « = 0.05. 
(b) Compute the power of the test in (a) for the alternative @ = 2. 
(c) Derive the most powerful test for the hypotheses of (a) based on a random sample of 
size n. 


Suppose that X ~ POI(w). 
(a). Derive the most powerful test of Hp: w= Mo. versus H,: w= My (u, > Uo) based on 
an observed value of X. 
(b) Rework (a) based on a random sample of size n. 


Let X ~ NB(r, 1/2). 
(a) Derive the most powerful test of size « = 0.125 of Hyp:r= 1 against H,:r = 2 
based on an observed value of X. 
(b). Compute the power of this test for the alternative r = 2. 


Assume that X is a discrete random variable. 
(a) Based on an observed value of X, derive the most powerful test of H, : 
X = GEO(0.05) versus H,:X ~ POI(0.95) with « = 0.0975. 
(b) Find the power of this test under the alternative. 


Let X4,..., X, have joint pdf f(x,,....,.x, 3.9) and S be a sufficient statistic for 6. Show 
that a most powerful test of Hy : 0 = 6) versus. H,: 6 = 0, can be expressed in terms of S. 


Consider a random sample of size n from a distribution with pdf f(x; 0) = (3x?/@)e"**” if 
0 <x, and zero otherwise. Derive the form of the critical region for a uniformly most 
powerful (UMP) test of size a of Hyp: 6 = 65 against H,;:6> 0). 


Suppose that X,, ..., X,, isa random sample from a normal distribution, X; ~ N(0, 07). 
(a) Derive the UMP size « test of Hy :¢ =o against H,:0 > do. 
(b) Express the power function of this test in terms of a chi-square distribution. 


(c) Ifn = 20, og = 1, and « = 0.005, use Table 5 (Appendix C) to compute the power of 
the test in (a) when o = 2. 


Consider a random sample of size n from a uniform distribution, X ;~ UNIF(0, 6). Find 
the UMP test of size a of Hy : 6 > 0 versus H,: 6 <6, by first deriving a most powerful 
test of simple hypotheses and then extending it to composite hypotheses. 


Let X,,..., X,, be a random sample from a normal distribution, X; ~ N(u, 1). 
(a) Find a UMP test of Hy: = Mo against H,: hu < Mo. 
(b) Find a UMP test of Hy: u = Uo against H,: “> po. 
(c) Show that there is no UMP test of Hy: u = uo against H,: u # Uo. 


Suppose that X is a continuous random variable with pdf f(x; 6) =.1 — 0?(x — 1/2) if 
0 < x < 1 and zero otherwise, where —1 < @ < 1. Show that a UMP size a test of 
H,):@=0 versus H,: 6 # 0, based on an observed value x of X,is to reject Hy ifx <a. 
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Consider a random sample X,, ..., X,, from a discrete distribution with pdf 
St (x; 6) = (6/6 + 1)]*/(6 + 1) ifx =0, 1, ... where 6.>.0. Find a UMP test of Hy : 0 = 6, 
against H,:0> 9. 


Let X,,..., X, denote a random sample from a gamma distribution, X¥, ~ GAM(9, x). 
(a) Ifx is known, derive a UMP size « test of Hy: 0< 05 against H,:0> 6. 
(b) Express the power function for the test in (a) in terms of a chi-square distribution. 
(c) Ifn = 4, « = 2, 0) =1, and « = 0.01, use Table 5 (Appendix C) to find the power of 
this test when 6.= 2. 
(d) If @ is known, derive a UMP size « test of Hp: < Kp against. H,:K > Ko. 


Consider .a random sample of size n froma Bernoulli distribution, X, ~ BIN(I, p). 
(a) Derive a UMP test of Hy: p< po versus.H, : p > po using Theorem 12.7.1. 
(b). Derive the test in (a) using Theorem 12.7.2. 


Suppose that X,,..., X, is arandom sample from a Weibull distribution, X; ~ WEI(6, 2). 
Derive a UMP test of Hy: 62 6) versus H,: 6 < 6, using Theorem 12.7.2. 


Show: that thetest of Exercise 20 is unbiased. 


Consider the hypotheses of Example 12.7.5, and consider a test with critical region of the 
form C.=.{(x4, «+5 Xq)| Sq S C1 OF. So S C2} where 59 = ). x7/o3 and where c, and c, are 
chosen to provide a test of size a. 
(a) Show that the power function of such a test has.the form a(a7) = 1 ~— H(c, 02/07; n) 
+ H(c, 02/07; n) where H(c; n) is the CDF of y?(n). 
(b) Show that for this test to be unbiased, it is necessary that c, and c, satisfy the 
equations H(c, ; n) — H(c,;n)=1—a and c,h(c,; n) — c,h(c,; n) = 0 where 
h(c; n) = H’(c; n). Hint: For the test to be unbiased, the minimum of x(o?) must 
occur at o? = 92. In Example 12.7.5, 0, = H(c,; n) and a, = 1—H(c,;n). 


Let X,,...,X, be a random sample from an exponential distribution, X; ~ EXP (6). 
(a) Derive the generalized likelihood ratio (GLR) test of Hy : 6 = 6) against H,: 
@ # 0). Determine an approximate critical value for size « using the large-sample 
chi-square approximation. 
(b) Derive the GLR test of Hy: 8 = 0, against H,:0> 0 . 


Consider independent random samples of size n, and n, from respective exponential 
distributions, X;~ EXP(#,) and Y, ~ EXP(6,). Derive the GLR test of Hy : 6, = 0, 
versus H,: 6, 4 63. 


Let X,,..., X, be a random sample from a distribution with pdf f(x; 0) = 1/06if0<x <6 
and zero otherwise. Derive the GLR test of Hy): € = 0) versus H,:0 # 0. 


Consider independent random samples of size n, and n, from respective normal 
distributions, X; ~ N(w,, 07) and Y;~ N(«,, 03). 
(a) Derive the GLR test of Hy : of = 03 against H,:0? 4 03, assuming that y, and p, 
are known. 
(b) Rework (a) assuming that , and yw, are unknown. 
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(c) Derive the GLR for testing Hy : 4, =p. and a? = o3 against the alternative 
Ho: fy # by 01-0? 403. 


Suppose that X is a continuous random variable with pdf f(x; 6) = @x°"'if0<x <1, 

and zero otherwise. Derive the GLR test of Hy): 6 = 09 against H,: @ # 8) based ona 
random sample of size n. Determine an approximate critical value for a size a test based 
on a large-sample approximation. 


To compare the effectiveness of three competing weight-loss systems, 10 dieters were 
assigned randomly to each system, and the results measured after six months. The 
following weight losses (in pounds) were reported: 


System 1: 4.3, 10.2, 4.4, 23.5, 54.0, 5.7, 10.6, 47.3, 9.9, 37.5 
System 2: 10.7, 6.4, 33.5, 54.1, 25.7, 11.6, 17.3, 9.4, 7.5, 5.0 
System 3: 51.0, 5.6, 10.3, 47.3, 2.9, 27.5, 14.3, 1.2, 3.4, 13.5 


(a). Assuming that the data are normally distributed, use the result of Theorem 12.8.1 to 
test the hypothesis that all three systems are equally effective in reducing weight. 

(b) Use the results of Theorems 12.8.2 and 12.8.3 to test the hypothesis that the variance 
in weight loss is the same for all three systems. 


Consider a random sample of size n from a distribution with pdf f(x; 6, w) = 1/26 if 
|x — p| < 8, and zero otherwise. Test Hy: 2 = 0 against H,: » # 0. Show that the GLR 
A= A(x) is given by : 


AUP = (Kain oe X13,)/[2 max (—Xin, Xn) 
Note that inthis case approximately —2 In A ~ y(2) because of equation (12.8.2). 


Let X,,..., X, be a random sample from a continuous distribution. 
(a) Show that the GLR for testing Hy: X ~ N(u, 0”) against H,:X ~ EXP(9, n) isa 
function of 6/6.Is the distribution of this statistic free of unknown parameters under 
Hy? 
(b) Show that the GLR for testing Hy: X ~ N(u, o7). against H,: X ~ DE(9, n)isa 
function of 6/4. 


Consider a random sample of size n from a two-parameter exponential distribution, 
X; ~ EXP(6, n), and let # and 6 be the MLEs. 
(a) Show that # and @ are independent. Hint: Use the results of Exercise 30 of Chapter 
10. 
(b) Let V, = 2n(X — )/O, V. = 2n(f — n)/0, and V; = 2n6/@. Show that V, ~ x7(2n), 
V, ~ 77(2), and V; ~ x2(2n — 2). Hint: Note that V, = V, + V3 and that V, and V, 
are independent. Find the MGF of V, by the approach used in Theorem 8.3.6. 
(c) Show that (n — 1)( — n)/6 ~ F(2, 2n — 2). 
(d) Derive the GLR for a test of Hyp: 7 =o versus.H,:7 > 1%. 
(e) Show that the critical region for a size « GLR test is equivalent to 
(n — 1) — no/8 > fi -.(2, 2n — 2). 
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Suppose that X and Y are independent with X ~ POI(u,) and Y ~ POI(u,), and let 
S=X+Y. 
(a) Show that the conditional distribution of X given S = s is BIN(s, p) where 
P= MaKe + 42). 
(b) Use the result of (a) to construct a conditional test of Hy : uw, = “2 versus 
A, ha # #2. 
(c) Construct a conditional test of Ho :4,/u, = Co versus H, : u,/uU, #% Co for some 
specified cg. 


Consider the hypotheses H,:¢ = 1.-versus H,:o = 3 when the distribution is normal 
with yu = 0. If x, denotes the ith sample value: 
(a) Compute the approximate critical values k7 and k3 for a SPRT with 
P[Type I error] = 0.10 and P[Type II error] = 0.05. 
(b) Derive the SPRT for testing these hypotheses. 
(c) Find a sequential test procedure that is stated in terms of the sequence of sums 


5, = ). x} and is equivalent to the SPRT for testing H, against H,. 
i=1 
(d) Find the approximate expected sample size for the test in (a) if Hp is true. What is 
the approximate expected sample size if H, is true? 
(e) Suppose the first 10 values of x; are: —2.20, 0.50, 2.55, — 1.85, —0.45, —1.15, —0.58, 
5.65, 0.49, and — 1.16. Would the test in (a) terminate before more data is needed? 


Suppose a population is Poisson distributed with mean yu. Consider a SPRT for testing 
Ho:w=1 versus H,:u=2. 


(a) Express the SPRT in terms of the sequence of sums s, = >” x;,. 
i=1 


(b) Find the approximate expected sample size if H, is true when « = 0.01 and £ = 0.02. 


Gross and Clark (1975, page 105) consider the following relief times (in hours) of 20 
patients who received an analgesic: 
1.1, 1.4, 1.3, 1.7, 1.9, 1.8, 1.6, 2.2, 1.7, 2.7, 4.1, 1.8, 1.5, 1.2, 1.4, 3.0, 1.7, 2.3, 1.6, 2.0 
(a) Assuming that the times were taken sequentially and that relief times are 
independent and exponentially distributed, X, ~ EXP(8), use an approximate SPRT 
to test the hypotheses Hy : @ = 2,0 versus H,: 0 = 4.0 with « = 0.10 and B = 0.05. 
(b) Approximate the expected sample size for the test in (a) when H, is true. 
(c) Approximate the expected sample size for the test in (a) when H, is true. 


Prove the identity (12.8.5). Hint: Within the squared terms of-the left side, add and 
subtract x,,and then use the binomial! expansion. 
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Most of the models discussed to this point have been expressed in terms of a pdf 
f(x; 6) that has a known functional form. Moreover, most of the statistical 
methods discussed in the preceding chapters, such as maximum likelihood esti- 
mation, are derived relative to specific models. In many situations, it is not pos- 
sible to identify precisely which model applies. Thus, general statistical methods 
for testing how well a given model “fits,” relative to a set of data, are desirable. 
Another question of interest, which cannot always be answered without the aid 
of statistical methods, concerns whether random variables are independent. One 
possible answer to this question involves the notion of contingency tables. 
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ONE-SAMPLE BINOMIAL CASE 


TABLE 93.1 


First let us consider a Bernoulli-trial type of situation with two possible out- 
comes, A, and A,, with P(A,) =p, and P(A) = p, = 1 — p,;. A random sample 
of n trials is observed, and.we let.o, =x. and.o,=n—x. denote the observed 
number of outcomes of type A, and type A,, respectively. We wish to test 
Ho: P, = Pyo against H,: p, # Py o. Under Ho the expected number of out- 
comes of each type is e; =npyo and e, = npr = n(l — pjo). This situation is 
illustrated in Table 13.1. 

We have discussed an exact small-sample binomial test, and we also have dis- 
cussed an approximate test based on the normal approximation to the binomial. 
The approximate test also can be expressed in terms of a chi-square variable, and 
this form can be generalized for the case of more than two possible types of 
outcomes and more than.one sample. 

The square. of the approximately. normally. distributed test statistic will be 
approximately distributed as 77(1), and it can be expressed as 


ee (% = nPi0)” Es (x —mpro)” , (= nPi0)? 


z AP o(1 — Pio) 7 APi0 n(1 — Pio) 
_ (= np)? (in — x) — nl — pio) I” 
= + — 
RP 10 n(l — Py) 
2 Hee \2 
(ope) (13.2.1) 
gaa 8 


An approximate size « test of Hp is to reject Ho if y?.> x7_,(1). 
The y? statistic reflects the amount of disagreement between the observed out- 
comes and the expected outcomes under Ho, and it is an intuitively appealing 


form. In this form, the differences in both cells are squared, but the differences are 
: 2 
linearly dependent because }° (0,—e,)=0, and the number of degrees of 
jal 
freedom is one less than the number of cells. 


Values of expected and observed outcomes 
for a binomial experiment 


Possible outcomes A, A, Total 
Probabilities Pro P29 4 
Expected outcomes €,=MDi9. &2=MP29 n 


Observed outcomes O,=x 02=N-xX n 
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We should note that the chi-square approximation can be improved in this 
case by introducing a correction for discontinuity. Thus, a somewhat more accu- 
rate result is obtained by using the statistic 


2 > (|0; — e;| —0.5)? 
j=t ej 

The chi-squared test statistic may be generalized in two ways: It can be 

extended to apply to a multinomial problem with k types of outcomes, and it can 


be generalized to an r-sample binomial problem. We will consider the r-sample 
binomial problem first. 


x (13.2.2) 


r-SAMPLE BINOMIAL TEST (COMPLETELY SPECIFIED H,) 


Example 73.3.7 


TABLE 13.2 


Suppose now that X,~ BIN(n,, p) for i=41,...;7, and we wish to test 
Ho: P:= Pio, where the pj are known constants. Now let 0;, = x; and 0; 
=n, — x; denote the observed outcomes in the ith sample, and let e,, = n;,pjo and 
€i2 = n{1 — pjo) denote the expected outcomes under Ho. 

Because a sum of independent chi-square variables is chi-square distributed, we 
have approximately , 


. r 2 (0;;~e )? 2 
ray yo ; tow yr) (13.3.1) 
i=1 j=1 ij 


An approximate size test is to reject H, if y? > y7_,(7). 


A certain characteristic is believed to be present in 20% of the population. 
Random samples of size 50, 100, and 50 are tested from each of three different 
races, with the observed outcomes shown in Table 13.2. 

The expected numbers of outcomes under Ho: p, = p2 = p3 = 0.20 then are 
€1, = 500.2) = 10, e2, = 100(0.2) = 20, and e,, = 50(0.2) = 10. The remaining e,, 
may be obtained by subtraction and are shown in Table 13.3. 


Observed outcomes fora 
three-sample binomial test 


Observed Outcomes 
Present Absent Total 
Race 1 20 30 50 


Race 2 25 75 700 
Race 3 15 35 50 


TABLE 13.3 


13.3 r-SAMPLE BINOMIAL TEST (COMPLETELY SPECIFIED H, 445 


Expected outcomes for a 
three-sampie binomial test 


Present Absent Total 


Race 1 10 40 50 
Race 2 20 80 100 
Race 3 10 40 50 
We have 
BE (Oy — ef)? 20 = 10)? (30 — 40)? (35 — 40)? 
2. ij BIAS gee Mth SI oy UN ag ate ge NEE 
Kea 2 ge Oe gg ee ET 
= 17.18 


Because 76, 99(3) = 11.3, we may reject Hy at the a = 0,01 significance level. 


TEST OF COMMON. p 


Perhaps a more common problem is to test whether the p, are all equal, 
Ho: Py = P2 =°** = Pp, = p, where the common value p is not specified. We still 
have the same r x 2 table of observed outcomes, bui the value p. must. be -esti- 
mated to estimate the expected numbers under H,. Under Hy the MLE of p is 
the pooled estimate 


Il 
hs 
= 


and ei = n; B, éi2 — n,(L = P), where N 


The test statistic is 


a 


r 2 a \2 

0O:;— é@, 

r=) ¥ cue a 3 a) (13.3.2) 
i=1 j=1 ij 


The limiting distribution of this statistic can be shown to be chi-squared with 
r — 1 degrees of freedom, so an approximate size « test is to reject Hy if 
> iadt =) (13.3.3) 


Quite generally in problems of this nature, one degree of freedom is lost for each 
unknown parameter estimated. This is quite similar to normal sampling, where 


y (74) ~x7(n) and x (2-4) ~ 7(n— 1) 
i=k i=1 


o 
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TABLE 13.4 Table of r-sample binomial 
observations 
Sample A, A, Total 
1 On %2 ny 
2 On O22 Mg 
r OS Ge n, 
N6  N(1-p) N 


We again have a linear relationship, because > (%; — X) = 0, and the effect is that 
the latter sum is equivalent to a sum of n—1 independent squared standard 
normal variables. The degrees of freedom also can be heuristically illustrated by 
considering the x x 2 table of observed outcomes in Table 13.4. Having a fixed 
value of the estimate p corresponds to having all of the marginal totals in the 
table fixed. Thus ry — 1 numbers in the interior of the table can be assigned, and 
all the remaining numbers then will be determined..In general, the number of 
degrees of freedom associated with an r x c table with fixed marginal totals is 


(ry — i) -(c— 1). 


Example 13.3.2 Consider again the data in Example 13.3.1, but suppose we had chosen to test the 
simple hypothesis that the proportion containing the characteristic is the same in 
the three races, Ho:py=p,=p3;=p. In this case, p= 60/200 = 0.3, 
1, = 50(60/200) = 15, @,, = 100(60/200) = 30, and the remaining expected 
numbers may be obtained by subtraction. These expected values are included in 
Table 13.5 in parentheses. In this case 

3 2 2 2 
a = 5. > (0;; ~ @,)°/2,; = ” =r tesit Sehett = = 3.57 
i=1 fJ=1 
Because 7$.99(2) = 9.21, we cannot reject the hypothesis of common proportions 
at the « = 0.01 level. 


TABLE 13.8 Observed and expected numbers 
under null hypothesis of equal 
proportions 


Present Absent Totai 
Race 1 20 (15) 30 (35) 50 
Race 2 25 (30) 75.(70) 100 
Race 3 15 (15) 35 (35) 50 


I Totai 60 140 200 


13.4 


ONE-SAMPLE 


Example 13.4.7 
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Note that reducing the degrees of freedom when a parameter is estimated 
results in a smaller critical value. This seems reasonable, because estimating p 
automatically forces some agreement between the é@,, and 0,;, and a smaller com- 
puted chi-squared value therefore is considered significant. In this example the 
smaller critical value still is not exceeded, which suggests that the proportions 
may all be equal, although we had rejected the simple hypothesis that they were 
all equal to 0.2 in Example 13.3.1. ; 

In Section 13.2 we mentioned that a correction for discontinuity may enhance 
the accuracy of the chi-squared approximation for small sample sizes. The chi- 
squared approximation generally is considered to ‘be sufficiently accurate for 
practical purposes if all the expected values in each cell are at least 2 and at least 
80% of them are 5 or more. 


MULTINOMIAL 


Suppose now that there are c possible types of outcomes, Aj, 42, ..., A,, and in 
a sample size n let 0,,..., 0, denote the number of observed outcomes of each 


type. We assume probabilities P(A;) = p;,j = 1,...,¢, where 5° p, = 1, and we 
j=1 

wish to test the completely specified hypothesis Ho: p; = Pjo,j = 1,..., c. Under 

H, the expected values for each type are given by e; = npjo. The chi-square sta- 

tistic again provides an appealing and convenient test statistic, where approx- 

imately 
¢ pee 2 

ya) Qed" ~(c — 1) (13.4.1) 

jer 

It is possible to show that the limiting distribution of this statistic under Hy is 

y7(c — 1), and this is consistent with earlier remarks concerning what the appro- 

priate number of degrees of freedom turns out to be in these problems. Equation 

(13.2.1) illustrated that one degree of freedom was appropriate for the binomial 

case with c = 2. Also, for fixed sample size, c — 1 observed values determine the 

remaining observed value. 


A die is rolled 60 times, and we wish to test whether it is a fair die, Hq : p; = 1/6, 
i=1,..., 6. Under H, the expected ‘outcome in each case is e; = npjo = 10, and 
the results are depicted in Table 13.6. 

In this case 


6 2 
y=) (op) 69 < 72 (5) = 9.24 
dei. & 


so we cannot reject Hy at the « = 0.10 level of significance. 
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TABLE 73.6 Observed and expected frequencies 
for a die-rolling experiment 


1 2 3 4 5 6 


Observed 8 11 5 12 15 9 60 
Expected 10 10 10 10 10 10 60 


As suggested in Section 13.3, if the model is not specified completely under Hy, 
then it is necessary to estimate the expected numbers, @,, and the number of 
degrees of freedom is reduced by one for each parameter estimated. This aspect is 
discussed further in the later sections. 


13.5 
r-SAMPLE MULTINOMIAL 


We may wish to test whether r samples come from the same multinomial popu- 
lation or that r multinomial populations are the same. Let A,, A,,..., A, denote 
c possible types of outcomes, and let the probability that an outcome of type A j 
will occur for the ith population (or ith sample) be denoted by p,,;. Note that 


2 Pyi = 1 for each i= 1,...,7. Also let o,; denote the observed number of out- 


hanks : type A; in sample i. For a completely specified Ho : py; = PY, then 
ej = n, py) under ‘Ho , and it is clear that equation (13.3.1) can be extended to this 
case. Approximately for each i, 


-¥ (04; = ej)” ey ~ (c _ 1) 


ery 


and 


PAC — ey ley ~ x7 — 1) (13.6.1) 


gi ae 


i=1 


Ie 


under Ho. 
The more common problem is to.test whether the r multinomial populations 
are the same without specifying the values of the p,,;. Thus we consider 
Ao: Pit = Py2 =''' = Pip =P; forj=1,2,...,¢ 


We must estimate c — 1 parameters p,,..., p..;, Which also will determine the 


estimate of p, because }’ p,; = 1. Under Hy the MLE of p; will be the pooled 
j=l 


Example 13.5.9 


TABLE 73.7 


TABLE 13.8 
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r 
estimate from the pooled sample of N = 5° n; items, which gives 


i=1 


where c, is the jth column total, and 
ej — ni; D; — n;,c,/N 


The number of degrees of freedom in this case is r(c — 1) —(c— 1) 
= (r — 1)(c — 1), and approximately 


v= y y (0;; — 8, )7/é,; ~ rr — lc — 1) (13.5.2) 


t=1 j=1 


In Example 13.3.1, suppose that the characteristic of interest may occur at three 
different levels: absent, moderate, or severe. We are interested in knowing 
whether the proportion of each level is the same for different races. 

The notation is depicted in Table 13.7. We wish to test Ho: pji = Dir 
= paz = P; for j = 1, 2, 3. 

The observed outcomes are shown in Table 13.8. The estimated expected 
numbers under Hy are given in parentheses, where @,, =n, p, =7,(c,/N) 


Conditional probabilities for a three-sample 
binomial test 


A, A, A, 

Severe Moderate Absent 
Sample 1 (Race 1) Pay Pay Paj1 1 
Sample 2:(Race 2) p4j2 Poj2 Paj2 1 
Sample 3 (Race 3) Pris P23 P3y3 1 

P, Pp, Ps 1 


Observed and expected outcomes 
RRR te EP aE ae ieee = ie een ama 


Observed Outcomes (Expected Outcomes) 


Severe Moderate Absent Total 
a 
s, (Race 1) 10 (6) 10 (9) 30 (35) 50 
S. (Race 2) 4 (12) 21 (18) 75 (70) 100 
s, (Race 3) 10 (6) 5 (9) 35 (35) 50 
ic SAE PL a Ee ee ere eee 
Total 24 36 140 200 


a 
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= 50(24)/200 = 6, 2). = ny(c2/N) = 9, 22, = n,(c,/N) = 12, 6, = n2(c,/N) = 18, 
and the others may be obtained by subtraction. 
The number of degrees of freedom in this case is (r — 1)(c — 1) = 2(2) = 4 and 


2 ae (0;; — 8)? 2 
v=T 5 = 14.13 > 13.3 = 18,00(4) 
i=1 j=1 ij 


so Hy can be rejected at the « =.0.01 level. 


The question of whether the characteristic. proportions change over samples (in 
this example, over races) is similar to the question of whether these two factors 
(characteristic and race) are independent. Note that in the r-sample multinomial 
case, the row totals. (sample sizes) are fixed, and it appears natural to consider the 
test of common proportions over samples, sometimes referred to as a test of 
homogeneity. In the example, one could have.selected 200 individuals at random 
and then counted the number that.fall in each race category. and each character- 
istic category. In this case, the row totals and column totals are both random. We 
can look at the. conditional test in. this case, given the row.totals, and analyze the 
data in the same way as before, but it appears somewhat more natural to look 
directly at a test of independence in this case.-It turns out that the same test 
statistic is applicable under this interpretation, as discussed in the next section. 


TEST FOR INDEPENDENCE, r xc CONTINGENCY TABLE 


Suppose that one factor with c categories is associated with columns and a 
second factor with r categories is associated with rows in an r x c contingency 
table. Let p;, denote the probability that a sampled item is classified in the ith 


row category and the jth column category. Let p,.= )° p, denote the marginal 
j=l 


probability that an individual is classified in row i, and let p. j= Y P;; denote the 
t=1 


marginal probability that an individual is classified in ‘the jth column, as illus- 
trated in Table 13.9. : 

Note that the total joint probabilities in this case add to 1, whereas in Table 
13.7 the probabilities under consideration correspond to conditional probabil- 
ities, and each row adds to 1. 

If the classification of an individual according to one factor is not affected by 
its classification relative to the other category, then the two factors are indepen- 
dent. That is, they are independent if the joint classification probabilities are the 
products of the marginal classification probabilities, p,; = p;.p.;. Thus, to test 


TABLE 13.9 
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Contingency table of joint 
and marginal probabilities 


ai 


Columns 
1 2 3 
1 Pr; Py2 Py3 P, 


Rows 2 Pa P22 P23 P2. 


¢ r 

independence we test Hg : pj = p;.p.j;. Let n= >. 0, and ¢; = > 0;; denote 
j=1 i=1 

the row and column totals as before, although the n, are not fixed before the 


r 
sample in this case. Let N = ), n, denote the total number of outcomes. Then 
i=1 
py. = n/N, pj =¢,/N, and under Ho the expected number of outcomes to fail in 
the (i, /) cell is estimated to be 


4 a on n; lon 
2: = NB = NB. = mC) = n;6/N 


We note that @,, reduces to exactly the same values obtained in the previous 
problem of testing equal proportions over samples. Thus.the chi-squared statistic 
for measuring the agreement between the observed outcomes 0,, and the expected 
numbers under Ho, é,;, is computed exactly the same as before. Also, as before, 


" asymptotic results show that approximately 


7 = VY (0 — 4)’ ej ~ 7 — Ve - 1) (13.6.1) 


With regard to the number of degrees of freedom, estimating the marginal prob- 
abilities p;. and p.; amounts to fixing the marginal totals, which then leaves 
(r — 1)(c — 1) degrees of freedom. This test is also similar to the asymptotic GLR 
test based on —2 In J. In that case, the number of degrees of freedom in & is 


re~1 because )) ) py =1, and the degrees of freedom for 2. and Ho is 
i=1 j= 
(r — 1) + (c — 1) because }) pj. = 1 and Y, p.; =1; thus the dimension of the 
i=1 j= 
parameters specified by Ho is (re— 1) —(r— 1) —(€— 1) =(r — 1)(c — 1). This 
result also is consistent with the interpretation discussed in Section 13.4. For a 
completely specified Hy the number of degrees of freedom is one less than the 
total number of cells, which in this problem becomes rc — 1. If Hg is not com- 
pletely specified, then the number of degrees of freedom is reduced by one for 


452 


Example 13.6.7 
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each parameter estimated, which in this case is r — 1 + c — 1, which again results 
in (r — 1)(c — 1) degrees of freedom. The formal justification for the approximate 
distributions and choice of degrees of freedom is based on asymptotic results that 
are not derived here. 


A survey is taken to determine whether there is a relationship between political 
affiliation and strength of support for space exploration. We randomly select 100 
individuals and ask their political affiliation and their support level to obtain the 
(artificial) data in Table 13.10. 


Contingency table for testing independence of two factors 


Support 
Affiliation increase Same Decrease 
Republican 8 (9) 12 (10.5) 10 (10.5) 30 
Democrat 10 (12) 17 (14) 13 (14) 40 
Independent 12 (9) 6 (10.5) 12 (10.5) 30 
30 35 35 100 


Under the hypothesis of independence, Hg : p;; = p;.p.;, the expected values are 

computed and given in parentheses. We have 
= > oy (04 = é:/)7/8; = 4.54 < 7.78 = x6 o0(4) 
toy 

thus we do not have sufficient evidence to reject the hypothesis of independence 
at the « = 0.10 level of error. Of course, we would obtain the identical result if we 
considered a conditional test given fixed row totals, and tested whether the 
support level probabilities are the same over the political affiliation categories. 
This is reasonable because, for example, if the “independents” had a higher prob- 
ability for increased support, then they would have to have a lower probability in 
some other category, which would represent a dependence between the two 
factors. 


Indeed, if we express the notation in Table 13.10 in terms of the joint probabil- 
ities, then pj; = p,,/p;. fepresents the conditional probability of being in column j 
given the ith row, and p;= p.; is the marginal probability of falling in the jth 
column classification; thus Ho: pj; = p; in the sampling setup in Section 13.5 is 
equivalent to the test of independence in this section, because py; = p;;/p;. = D.; 


implies p;; = pj. P.;- 


13.7- 
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CHI-SQUARED GOODNESS-OF-FIT TEST 


Example 13.7.7 


The one-sample multinomial case discussed in Section 13.4 corresponds to testing 
whether a random sample comes from a completely specified multinomial dis- 
tribution. This test can be adapted to test that a random sample comes from any 
completely specified distribution, Hy : X ~ F(x). Simply divide the sample space 
into c cells, say A,,..., A,, and let pj. = P[X € A;] where X ~ F(x). Then for a 
random sample of size n, let 0; denote the number of observations that fall into 
the jth cell, and under Hy the expected number in the jth cell 1s e; = npjo. This is 
now back in the form. of the multinomial problem, and Hy): X ~ F(x) is rejected 
at the « significance level if 


= » (0;- e/)/e; > ide = De (13.7.7) 
jel 


In some cases there may be a natural choice for the cells.or the data may be 
grouped to begin with; otherwise, artificial cells may be chosen. As a general 
principle, as many cells as possible should be used to increase the number of 
degrees of freedom, as long as e; > 5. or so is maintained to ensure that the 
chi-squared.approximation is fairly accurate. 


Let X denote the repair time in days required for a certain component in an 
airplane. We wish to test whether a Poisson model with a mean of three days 
appears to be a reasonable model for this variable. The repair times for 40 com- 
ponents were recorded, with the results shown in Table 13.11. In some cases the 


- component could be repaired immediately on-site, which is interpreted as zero 


TABLE 13.17 


days. 

Under Hy: X ~ POI(3), we have f(x) = e~ °3%/x!, and the cell probabilities 
are given by pio = P[X =0] =f(0) =e" * = 0.050, pr) =f (1) = e 73 = 0.149, 
P30 =f (2) = 0.224, and so on. The expected numbers are then e; = npj. The 


Observed and expected frequencies for chi-square goodness-of-fit 
test of Poisson model with mean 3 


Repair Time 


(Days) 0 1 2 3 4 5 6 >7 
Observed (o,) 1 3 7 6 10 7 6 0 
Probabilities (pj9.) 0.050 0,149 0.224 0.224 0.168 0.101 0.050 0.034 
Expected (e;) 2.00 5.96 8.96 8.96 6.72 4.04 2.00 1.36 
—— oe Neen pete 


7.96 7,40 
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right-hand tail cells are pooled to achieve an e; > 5, and the first two cells also 
are pooled. This leaves c = 5 cells and 


» (4-796) (7 = 8.96)? (13 ~ 7.40)? 
5 AD Oe go 9.98 
x 796. * 896 ¢ Tt” 349 "7 
= X6.90(4) 


so we can reject Hy at the w = 0.10 significance level. 


UNKNOWN. PARAMETER CASE 


When using goodness-of-fit procedures to help select an appropriate population 
model, we usually are interested in testing whether some family of distributions 
seems appropriate, such as Hy: X ~ POI(u) or Hy: X ~ N(u, 0”), where the 
parameter values are unspecified, rather than testing a completely specified 
hypothesis such as Hy : X ~ POI(3). That is, we are not interested in the lack of 
fit because of the wrong parameter value at this point, but we are interested in 
whether the general model has an appropriate form and will be a suitable model 
when appropriate parameter values are used. 

Suppose we wish to test Hy: X ~ f(x; 6,,..., 0,), where there are k unknown 
parameters, To compute the x? statistic, the expected numbers under Hy now 
must be estimated. If the original data are grouped into cells, then the joint 
density of the observed values, 0,, is multinomial where the true but unknown 
Pjo = PLX € A,] are functions of 6,,..., 0,. If maximum likelihood estimation is 
used to estimate 0,,..., 8, (based on the multinomial distribution of grouped 
data values o,), then the limiting distribution of the x? statistic is chi-squared with 
degrees of freedom c — 1.— k, where c is the number of cells and k is the number 
of parameters estimated. That is, approximately, 


c — 3)? 
=D. (oj = ey" ~(e-1-—k (13.7.2) 
jet 8; 


where é; = npjo. 

In many cases, MLE estimation based on the grouped data multinomial model 
is not convenient to carry out, and in practice the usual MLEs based on the 
ungrouped data, or on grouped data approximations of these, most often are 
used. 

If MLEs based on the individual observations are used, then the number of 
degrees of freedom may be greater than c — 1 — k, but the limiting distribution is 
bounded between chi-square distributions with c — 1 and c— 1—k degrees of 
freedom. Our policy here will be to use c — k ~ 1 degrees of freedom if k param- 
eters are estimated by any ML procedure. A more conservative approach would 
be to bound the p-value of the test using c — 1 — k and c — 1 degrees of freedom 
if the MLEs are not based directly on the grouped data (Kendall and Stuart, 
1967, page 430). 


| Example 13.7.2 


TABLE 13.72 
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Consider again the data given in Example 13.7.1, and suppose now that we wish 
to test Hy: X ~ POI). The usual MLE of yw is the average of the 40 repair 
times, which in this case is computed as 


ju = [O(1) + 1(3) + 2(7) + --- + 6(6)]/40 = 3.65 


Under Hy the estimated probabilities are now pjo =e *°°(3.65)//j! and the esti- 
mated expected values, 2; = npjo, are computed using a Poisson distribution with 
}: = 3.65. Retaining the same five cells used before, the results are as shown in 
Table 13.12. . 


Observed and expected frequencies for chi-square test 
of Poisson model with estimated mean 3.65 


Repair Times (Days) (Q, 1) 2 3 4 25 
Observed (0,) 4 7 6 10 13 
Probabilities (6 ;) 0.121 0.173 0.211 0.192 0.303 
Expected (é,) 4,84 6.92 8.44 7.68 12.12 
We have 
2 (4- 4.84)? (3 — 12.12)? 


= 1.62 < 6.25 = y2.99(3) 


4.84 12.12 


so a Poisson model appears to be quite reasonable for these data, although the 
Poisson model With ~ = 3. was found not to fit well. The number of degrees of 
freedom here is 3, because one parameter is estimated. 


Note that the question of how to choose the cells is not quite so clear when Hp 
is not completely specified. For a completely specified Hy the best choice is to 
choose cells so that all the e; are approximately equal to 5. This makes the e, 
large enough to ensure the accuracy of the chi-squared approximation and still 
gives the largest possible number of degrees of freedom. Of course, with discrete 
distributions this: may not be completely achievable. If H is not completely 
specified, then the e, or é, cannot be computed before taking the sample. The 
usual procedure is to choose some natural or reasonable cell division initially, 
and then pool adjacent cells after the data are taken to achieve @; > 5. This 
pooling should not be done in a capricious manner. In some cases the data 
already are grouped, and this provides an initial cell division. Indeed, one advan- 
tage of the chi-squared goodness-of-fit statistic is that it is applicable to grouped 
data. On-the other hand, if the individual observations. are available, then some 
information may be lost by using only grouped data. Some additional goodness- 
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of-fit tests based more directly on the individual observations are mentioned in 
the next section. 


Trumpler and Weaver (1953) provided data collected at the Lick observatory on 
the radial velocities of 80 bright stars, as shown in Table 13.13. 


Observed and expected frequencies for chi-square 
goodness-of-fit test with estimates 
fi= —20.3 and ¢=12.7 


intervals of Velocities 0, Pio 6, 
(—80, -70) 1 .000 
(-70, ~60) 2 001 
(-60, -50) 2° 15 009 ¢ .224 17.92 
(-50, —40) 2 051 
(-40, —30) 8 163 
(-30, 20) 24 .284 22.72 
(-20, -10) 26 .283 22.64 
(-10, 0) 11 154 
(0, 10) 2415 046) a09 16.72 
(10, 20) 1 008 
(20, 30) 1 001 

80 1.000 80.00 


We wish to test for normality, Hy: X ~ N(u, 0). We will use the chi-square 
test with yz and o estimated by the MLEs based on the grouped data of Table 
13.13. If we denote the arbitrary jth cell by A; = (aj, a;,,) and if z; = (a; — y/o, 
then the likelihood equation based on the multinomial data is 

n! as 
L= at oes [P(z;.1) — O(@)]” 
The likelihood equations are obtained by equating to zero the partials of the 
logarithm of L with respect to wand o. Specifically, we obtain 


c 


Dd of O'G;41) — OZ) +1) — O2)) = 0 (13.7.3) 


j=i1 
Y, ofzjs1 Gye.) — 2) (EMO j+1) — OG] = 0 13.74) 


Equations (13.7.3) and (13.7.4) must be solved by an iterative numerical 
method, and for the data of Table 13.13 the estimates of 4 and o are 
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fi =.—20.3.and ¢ = 12.7. The estimated cell probabilities are of the form 
Bjo = O(2;+1) — O(2) with 2; = (a; — f)/6é. Each pjo must be at least 5/80 = 0625 
to ensure that @,> 5. To satisfy this requirement in the present example, it is 
necessary to pool the first five cells and, similarly,.the. last four cells must be 
pooled. This reduces the number of cells to c = 4 with the pooled results shown 
in Table 13.13. It follows that 


4 
v= DY (0, —@)/e; = 1.22 < 3.84 = x2.,,(1) 
i=1 


Thus, the normal model gives a reasonable fit in this example. 


It might seem that a simpler method could be based on the following often- 
used grouped-data estimates: 


i= 


where m, is the midpoint of the jth interval. However, for this example, these 
estimates are ff = —21 and 6 = 16. The latter estimate ‘of o is somewhat larger 
than the grouped-data MLE. In fact, it is large enough that the chi-square test 
based.on this estimate rejects the normal model, contrary to our earlier conclu- 
sion. Another type of simple closed-form estimate for. grouped-data will be dis- 
cussed in Chapter 15. 


OTHER GOODNESS-OF-FIT TESTS 


Several goodness-of-fit tests have been developed in terms of the empirical dis- 
tribution function. The basic principle is to see how closely the observed sample 
cumulative distribution agrees with the hypothesized theoretical cumulative dis- 
tribution. Several methods for measuring the closeness of agreement have been 
proposed. 

The EDF tests generally are considered to be more powerful than the chi- 
squared goodness-of-fit test, because they make more direct use of the individual 
observations. Of course, then they are not applicable if the data are available 
only as grouped data, . 

Let X4,,5 +++» Xy:, denote an ordered random sample of size n. Then the EDF or 
sample CDF is denoted by F(x), and at the ordered values note that 


FXi.n) i/n 


If we wish to test a completely specified hypothesis, Ho: X ~ F(x), then the 
general approach is to measure how close the agreement is between F(x;,,) and 
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F(Xtin) fori = 1,..., n. Some slight modifications have been found helpful, such as 
using a half-point correction and comparing the hypothesized F(x;.,,) values to 
(i — 0.5)/n rather than to i/n. 

Because U = F(X) ~ UNIF(0, 1), the test of any completely specified H, can 
be expressed equivalently as a test for uniformity, where U;,,, = F(X;,,) are distrib- 
uted as ordered uniform variables. 


CRAMER-VON MISES TEST FOR COMPLETELY SPECIFIED H, 


The Cramér-Von Mises (CVM) test can be modified to apply to Type II censored 
samples. If the r smallest ordered observations from a sample of size n are avail- 
able, then the CVM test statistic for testing H) : X ~ F(x) is given by 


1 : i—0.5\? 
CM = Tn + Dy (Fossa PES ) 13:81) 


The distribution of CM under Hg is the same as the distribution of 


i us i—0.5)\? 
So Ue = 
12n - » ( on n ) 


where the -U;., are ordered uniform variables. The asymptotic percentage points 
for CM have been obtained by Pettitt and Stephens (1976). They appear to be 
sufficiently accurate for practical purposes for samples as small as 10 or so. 

An approximate size « test of Hy: X ~ F is to reject Hy if CM >CM,_,, 
where the critical values CM,_, are provided in Table 9 (Appendix C) for:several 
values of « and censoring levels. 


We are given that 25 system failures have occurred in a 100 day period, and we 
wish to test whether the failure times are uniformly distributed, H, : F(x) 
= x/100, 0:< x < 100, where the 25 ordered observations are as follows: 


5,2, 13.6, 14.5, 14.6, 20.5, 38.4, 42.0,.44.5, 46.7, 48.5, 50.3, 56.4, 61.7, 
62.9, 64.1, 67.1, 71.6, 79.2, 82.6, 83.1, 85.5, 90.8, 92.7, 95.5, 95.6 


We have 


1 as Xi.25 i— 0.5 7 
eas 12(25) +2 (32: > ggy 0182 


Because CMo 99 = 0.347, we cannot reject Hy at the « = 0.10 level of significance. 

If a Poisson process is observed for a fixed time t, then given the number of 
occurrences, the successive failure times are conditionally distributed as ordered 
uniform variables. This suggests that the above data could represent data from a 
Poisson process (see Chapter 16). 
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if the above data represent successive failure times from a Poisson process, 
then the times between failures should be independent exponential variables. The 
interarrival times are as follows: 


5.2. °8.4.50:9, Oly 8.9) 179.23.6, 2.5, 10.24.85 18,04. 3.3, 12,122, 
3.0, 3.5, 7.6, 3.4, 0.5, 2.4, 5.3, 1.9, 2.8, 0.1 


If we wish to use the CM statistic to test these data for exponentiality, we must 
completely specify Hy. Suppose that we test Hy : Y ~ EXP(S5). We have 


1 33 i—0.5\? 
CM as deep ries Ce) ae. 
7305) * 2 ( Ps 25 ) O.478 


Because CMy 99 = 0.743, we see that we cannot reject the hypothesis that the 
interarrival times follow an EXP(5) distribution at the « = 0.01 significance level. 


CRAMER-VON MISES TEST, PARAMETERS ESTIMATED 


As suggested in the previous section, we often are more interested in testing 
whether a certain family of distributions is applicable rather than testing a com- 
pletely specified hypothesis. To test Hy: X ~ F(x; 8), where 6 is unspecified, we 
may consider 


DP 1 Z z i—0.5\? 
CM On +2 (Fer ; ) (13.8.2) 


where 6 denotes the MLE of 8. In general, the distribution of CM may or may 
not depend on unknown parameters; however, we know that if 8 = (0,, 62) are 
location-scale parameters, then F(X; 6,, 6,) and F(X;., ; 6,, 6,) are pivotal quan- 
tities whose distribution does not depend on the parameters. Thus, at least in the 
case of location-scale. parameters, CM provides a suitable test statistic whose 
critical values can be determined. We have the disadvantage in this case that the 
critical values depend on the form of F being tested, whereas in the original 
situation the same critical values are applicable for testing any completely speci- 
fied hypothesis. Some asymptotic and simulated critical values are available in 
the literature for certain models such as the exponential, normal, and Weibull 
distributions. Stephens considers slight modifications of the test statistic so that 
the asymptotic critical values are quite accurate even for small n for complete 
sample tests of normality and exponentiality. Some of these results, along with 
the Weibull case, are included in Table 10 (Appendix C). Pettitt and Stephens 
(1976) and Stephens (1977) provide additional results, including the censored 
case. 
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Now we may test whether the interarrival times in the previous example follow 
any exponential distribution, EXP(6). In this case, § = jy = 3.7, and 


25 —_ 2 
aly gees (: Ml. 7 2) = 0.051 


Because (1 + 0.16/n)CM = 0.051 < 0.177, we cannot reject Hy at the « = 0.10 
level. 


KOLMOGOROV-SMIRNOV OR KUIPER TEST 


The Kolmogorov-Smirnov (KS) test statistic is based on the maximum difference 
between the sample CDF and the hypothesized CDF. To test a completely speci- 
fied Hy’: X ~ F(x), let 


D* = max (i/n — F(x;.,)) (13.8.3) 
i 

D~ = max (F(x;.,) — (i — L/n)) (13.8.4) 

D=max(D*, D7) (13.8.5) 

VE pt & p- (13.8.6) 


Carets will be added if unknown parameters must be estimated. The first three 
statistics are KS statistics, and V is Kuiper’s test-statistic. The distributions of 
these statistics do not depend on F. Also, as in the CVM case, if location-scale 
parameters are estimated, then the distributions will not depend on the param- 
eters, but they then will depend on-the form of F. The KS statistics allow for 
one-sided alternatives, and they also have been extended to two-sample prob- 
lems. 

Stephens (1974, 1977) has derived asymptotic critical values for these statistics, 
and he has considered modifications so that these critical values are good for 
small n. Some of these results are summarized in Table 11 (Appendix C). The 
Weibull results are provided by Chandra et al. (1981), and they also provide more 
accurate small sample results for this case, as well as percentage points for D* 
and D™~. These results were developed for the extreme-value distribution for max- 
imums, which is related to the Weibull distribution by a monotonically decreas- 
ing transformation; thus the D* and D™ critical values are interchanged when 
applying them directly to the Weibull distribution. 

Many other EDF type test statistics, as well as tests devised specifically for a 
certain model, are available in the literature. Other references in this area include 
Aho et al. (1983), Dufour and Maag (1978), Koziol (1980), and Bain and Engel- 
hardt (1983). 
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Example 13.8.3 Let us rework Example 13.8.2 using the Kolmogorov test statistic. A plot of the 
EDF F(y,,,) = i/n and the estimated CDF 


FY jen i 6) = 1 _ e7 Yin/3.7 


is given in Figure 13.1. The value of D occurs at the fifth order statistic 
5-1 
D=D~ =1—e71:7/3.7 ——— = 0.117 
é 75 0 


The modified test statistic is 
(5 + 0.26 + 0.1)(0.117 — 0.2/25) = 0.584 < 0.995 


so again we cannot reject the hypothesis of exponentiality at the 0.10 significance 
level. 


FIGURE 13.1 Comparison of an empirical CDF with an exponential CDF with estimated mean 
= 3.7 


SUMMARY 


Our purpose in this chapter was to introduce several tests that are designed 
either to determine whether a hypothesized distribution provides an adequate 
model or to determine whether random variables are independent. 

Chi-square tests are based on the relative sizes of the differences between 
observed frequencies and the theoretical frequencies predicted by the model. They 
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have the advantages of being fairly simple and also of having an approximate 
chi-square distribution. 

Tests for independence based on contingency tables use the differences between’ 
observed frequencies of joint occurrence and estimates of these frequencies made 
under the assumption of independence. Other goodness-of-fit tests, such as the 
CVM and KS tests, are based on the differences between the empirical CDF, 
computed from a random sample, and the CDF of the hypothesized model. 
These tests generally are harder to work with numerically, but they tend to have 
good power relative to many commonly used alternatives. 

All of the goodness-of-fit tests considered here are designed primarily for 
testing completely specified hypotheses, but all can be adapted for testing com- 
posite hypotheses by estimating unknown parameters of the model. When this is 
the case, the power of the test is less than for completely specified hypotheses, 
and the critical values needed to perform the test are changed. This is taken care 
of easily for chi-square tests by adjusting the number of degrees of freedom (one 
degree of freedom is subtracted for each parameter. estimated). The situation is 
not as convenient for the CVM and KS tests, because new tables of critical values 
are required, and these must be obtained for the specific parametric form that is 
being tested (normal, Weibull, etc.). 


EXERCISES 


A baseball player is believed to be a .300 hitter. In his first 100 at bats in a season he gets 
20 hits. Use equation (13.2.1) to test Hy :p = 0.3 against H,: p # 0.3 at thea = 0.10 
significance level. What would you do if you wanted a one-sided test? 


You flip a coin 20 times and observe 7 heads. Test whether the coin is unbiased at the 
a = 0.10 significance level. Use equations (13.2.1) and (13.2.2). 


Consider Example 13.3.1, but suppose that the following data are observed: 


Present Absent Total 


Race 14 10 40 50 
Race 2 50 50 100 
Race 3 30 20 50 


(a) Test Hy: py = 0.25, p2 = 0.50, p; =.0.50. at a = 0.10. 
(b) Test Ho: py = pz = p3 = 0.50 at a = 0.10. 
_(c) Test Ho: py = pz =p; ata= 0.10. 


4. A system contains four components that operate independently. Let p; denote the 


probability of successful operation of the ith component. Test Hy : p, = 0.90, p, = 0.90, 
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P3 = 0.80, p, = 0.80, if in 50 trials the components operated successfully as follows: 


Component 1 2 3 4 

Successful 40 48 45 40 

In a certain genetic problem it is believed that brown will occur with probability 1/4, white 
with probability 1/4, and spotted with probability 1/2. 


(a) Test the hypothesis that this model is correct at « = 0.10, if the following results 
were observed in 40 trials: 


Brown White Spotted 


Observed 5 15 20 


(b) Test the hypothesis that the probabilities are 1/9, 4/9, and 4/9, respectively, at 


a= 0.10. 
A sample of 36 cards are drawn with replacement from a stack of 52 cards with the 


following results: 


spades hearts diamonds clubs 
6 8 9 13 


Test the hypothesis at « =.0,05 that equal numbers of each suit are in the stack of cards. 


Three cards are drawn from a standard deck of 52 cards, and we are interested in the 
number of hearts obtained. The possible outcomes are x, = 0, 1, 2, or 3, and if we assumed 
the usual sampling-without-replacement scheme, we would hypothesize that these values 


13\f 39 \ //52 
would occur with probabilities p,; = f(x,), where f(x) = ( . \( 3 ‘ | ( A ) The following 


. data are available from 100 trials of this experiment. Test H, : p, = f(x), i= 1,..., 4 at the 
a = 0.05 level based on these data: 


No. of Hearts | 0 1 2 3 
Times Occurred 40 45 12 3 


Note: Combine the last two cells. 


A box contains five black marbles and 10 white marbles. Player A and Player B each are 
asked to draw three marbles from the box and record the number of black marbles 
obtained. They each do this 100 times, with the following results: 


Observed Outcomes 


0 1 2 3 


Player A 25 40 25 10 
Player B 40 40 15 5 
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(a) Use equation (13.5.1) to test the hypothesis that Player A drew the marbles without 
replacement and that Player B drew the marbles with replacement. Let « = 0.10. 

(b) Similarly, test the hypothesis that A drew with replacement and B drew without 
replacement. 

(c) Use equation (13.5.2) to test the hypothesis that the two multinomial populations 
are the same at a = 0.10. 


A question was raised as to whether the county distribution of farm tenancy over a given 


period of time in Audubon County, Iowa, was the same for three different levels of soil 
fertility. The following results are quoted from Snedecor (1956, page 225): 


Soil Owned Rented Mixed 


| 36 67 49 162 
I 31 60 49 140 
1 58 87 80 225 


Test the hypothesis that the multinomial populations.are the same for the three different 
soil fertility levels at ¢ = 0.10. 


Certain airplane component failures may be classified as mechanical, electrical, or 
otherwise. Two airplane designs are under consideration, and it is desired to test the 
hypothesis that the type of falure is independent of the airplane design. Test this 
hypothesis at « = 0.05 based on the following data: 


Mech. Elect. Other 


Design | 50 30 60 
Design |} 40 30 40 


A sample of 400 people was asked their degree of support of a balanced budget and their 
degree of support of public education, with the following results: 


Supported Balanced Budget 


Undecided 


Public Education Weak 


Strong 100 80 . 20 
Undecided 60 50 15 
Weak 20 50 ; 5 


Test the hypothesis of independence at a = 0.05. 


A sample of 750 people was selected and classified according to income and stature, with 
the following results: 


73. 
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Income 


Stature Poor Middle 


Thin 100 50 50 
Average 50 200 70 
Fat 120 60 50 


Test the hypothesis that these two factors are independent at « = 0.10. 


A fleet of 50 airplanes was observed for 1000 flying hours, and the number of planes, m,, 
that suffered x component failures in that time is recorded below: 


x + © 4 9 °S- -# -ss 
mot 40 OP BO 
(a) Test Hy: X ~ POI(2) at « = 0.01. 


(b) Test the hypothesis that X follows.a form of the negative binomial distribution 
given by 


ae (yt® Chae 


FAX) = ( . get 
with k = 3 and yt = 1. Use a = 0.10. 


Consider the data in Example 4.6.3. 
(a) Test Hy: X ~ EXP(100) at « = 0.10. 
(b) Test Hy: X ~ EXP(8) at a = 0.10. 


The following data concerning the number of thousands of miles traveled between bus 
motor failures were adapted from Davis (1952): 


Observed Bus Motor Failures 
Miles (1000) First Second Third Fourth Fifth 

0-20 6 19 27 34 29 
20-40 41 13 ~~ 16 20 27 
40-60 16 13 18 15 44 
60-80 25 15 13 15 8 
80-100 34 15 11 8 5 
100-120 46 18 10° 3 2 
120-140 33 5 4 4 — 
140-160 16 2 0 — — 
160-180 2 2 0 — — 
180-200 2 2 2 — = 


(a) Test the hypothesis that the data for the first bus motor failure follow an 
exponential distribution at « = 0.05. 
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(b) Test the hypothesis that the data for the fifth bus motor failure follow an 
exponential distribution at « = 0.10. 

(c) Test the hypothesis that the data for the first bus motor failure follow a normal 
distribution at « = 0.10. 


The number of areas m, receiving y flying bomb hits is given as follows 


Test Hy: Y ~ POI(w) at « = 0.05. 
In Problem 13, test Hy): X ~ POI(u) at « = 0.05. Assume that x =: 3.25. 


The lifetimes in minutes of 100 flashlight cells were observed as follows [Davis (1952)]: 


Number of Minutes | 0-706 706-746 746-786 786-00 


Observed Frequency | 13 36 38 13 
Test Hy : X ~ N(u, 02) at « = 0.10, Note that x = 746 and s = 40. 


Consider the weights of 60 major league baseballs given in Exercise 24 of Chapter 4. Test 
Hy: X ~ Nw o?). 


Consider the data in Example 4.6.3. 


(a) Use the CM statistic to test Hy : X ~ EXP(100). 

(b) Use the CM statistic to test Hy : X ~ EXP(8). 

(c) Use the CM statistic based on the first 20 observations to test Hy) : X ~ EXP(100). 

(d) Use the Kolmogorov-Smirnov statistic to test Hy: X ~ EXP(100). Let « = 0.10 
throughout. \ 


Lieblein and Zelen (1956) provide the following data for the endurance, in millions of 
revolutions, of deep-groove ball bearings: 


17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.48, 51.84, 51.96, 54.12, 
55.56, 67.80, 68.64, 68.64, 68.88, 84.12, 93.12, 98.64, 105.12; 
105.84, 127.92, 128.04, 173.40. 


Test Hy: X ~ WEI(6, B) at « = 0.10. Note: 8 = 2.102 and 6 = 81.88. 
(a) Use a chi-squared test. 


(b) Use a CVM test. 
(c) Use a KS test. 
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22. Gross and Clark (1975, page 105) consider the following relief times in hours of 20 patients 
receiving an analgesic: 


1.1, 1.4, 1.3, 1.7, 1.9, 1.8, 1.6, 2.2, 1.7, 2.7, 4.1, 1.8, 1.5, 1.2, 1.4, 3.0, 1.7, 
2.3, 1.6, 2.0. 

(a) Test Hy: X ~ EXP(6) at « = 0.10. Note that x = 1.90. 

(b) Test Hy: X ~ Niu, a7) at a = 0.10. 

(c) Test Hy: X ~ LOG Niu, o”) at « = 0.10. 

(d) Test Hy: X ~ WEI(6, £) at « = 0.10. Note that f=2.79 and 6 = 2.14, 


14.1] 


NONPARAMETRIC 
METHODS 


INTRODUCTION 


468 


Most of the statistical procedures discussed so far have been developed under the 
assumption that the population or random variable is distributed according to 
some specified family of distributions, such as normal, exponential, Weibuil, or 
Poisson. In the previous chapter we considered goodness-of-fit tests that are 
helpful in deciding what model may be applicable in a given problem. Some types 
of questions can be answered and some inference procedures can be developed 
without assuming a specific model, and these results are referred to as nonpara- 
metric or distribution-free methods. The advantages of nonparametric methods 
are that fewer assumptions are required, and in many cases only nominal 
(categorized) data or ordinal (ranked) data are required, rather than numerical 
(interval) data. A disadvantage of nonparametric methods is that we usually 
prefer to have a well-defined model with important parameters such as means 
and variances included in the model for interpretation purposes. In any event, 
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many important questions can be answered by a nonparametric approach, and 
some of these results will be given here. 

The CVM goodness-of-fit test already discussed is an example of one type of 
distribution-free result. This type of result depends on the fact that 
U = F(X) ~ UNIF(, 1) for a continuous variable, and distributional results can 
be obtained for functions of F(X) in terms of the uniform distribution, which hold 
for all F. For the more classical nonparametric tests, the probability structure is 
induced by the sampling or randomization procedures used, as in the counting 
type probability problems considered in Chapter 1. 


ONE-SAMPLE SIGN TEST 


Theorem 14.2.7 


Consider a continuous random variable X ~ F(x), and let m denote the median 
of the distribution. That is, P[X <m]=P[X >m]=1/2. We wish to test 
Hom =m against H,:m> my. This is a test for location, and it is thought of 
as analogous to a test for means in a parametric case. Indeed, for a symmetric 
distribution, the mean and the median are equal. 

Now we take a random sample of n observations and let T be the number of 
x?s that are less than mo. That is, we could consider the sign of (x; — mo), 
i=1,..., 7, and let 


T = Number of negative signs (14.2.1) 


Note that we do not really need numerical interval scale data here; we need only 
to be able to rank the responses as less than mp or greater than mo. 

Under Ho:m= mo, we have P[X <mo] = PLX;— 1m < 0] = 1/2, so the 
probability of a negative sign is po = 1/2. Under the alternative m, > my, we 
have p, = P[LX < mo] < P[X <m,] = 1/2. Clearly the statistic T follows a bino- 
mial distribution, and when m = mo, 

T ~ BIN(, Po) where po = P[X < mo] = 1/2 

A test of Hy:m =m, against H,:m> mz, based on T is equivalent. to the 
binomial test of Hg: p = Po = 1/2 against H,: p< 1/2, where T represents the 
number of successes, and a success corresponds to a negative sign for (x; — Mo). 
That is, for the alternative m > mo, Ho is rejected if T is small, as described in 
Theorem 12.4.1 earlier. 


Let X ~ F(x) and F(m) = 1/2. A size o test of Ho: m = mo against H,:m> mo 
is to reject Hg if 


B(t; n, 1/2) < 


where t = number of negative signs of (x; — mo) fori = 1,..., n. | 
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The other one-sided or two-sided alternatives would be. carried out similarly. 
The usual normal approximation could be used for moderate sample sizes. Also 
note that the sign test corresponds to a special case of the chi-square goodness- 
of-fit procedure considered earlier. In that case, c categories were used to reduce 
the problem to a multinomial problem; here we have just two categories, plus 
and minus, with each category equally likely under.H,. Of course, the sign test is 
designed to. interpret inferences specifically related to. the. median of the original 
population. 

The sign test for the median has the advantage that it is valid whatever the 
distribution F. If the true F happens to be normal, then one may wonder how 
much is lost by using the sign test compared to using the usual ¢ test for means, 
which was derived under the normality assumption. One way of comparing two 
tests is to consider the ratio of the sample sizes required to achieve a given power. 
To test Hy: = @, let n,(0,) be the sample size required to achieve a specified 
power at 6; for test one, and n,(6,;) the sample size required to achieve the same 
power for test two. Then the (Pitman) asymptotic relative efficiency (ARE) of test 
two compared to test one is given by 


ARE = lim ni(6) (14.2.2) 
01> 80 n2(0,) 

If the test statistics are expressed in terms of point estimators of the parameter, 
then in many cases the ARE of the tests corresponds to the ratio of the variances 
of the corresponding point estimators. Thus there is often a connection between 
the relative efficiency of atest and the-relative efficiency of point estimators as 
defined earlier. This aspect of the problem will not be developed further here, but 
it can be shown that under normality, the ARE of the sign test compared with 
the ¢ testis given by 2/x = 0.64, and it increases to approximately 95% for small 
n. That is, when normality holds, a ¢ test based on 64 observations would give 
about the same power as a sign test based on 100 observations. Of course, if the 
normality assumption is not true, then the t test is not valid. Another restriction 
is that interval scale data are needed for the t test. 


The median income in a certain. profession is $24,500. The contention is that 
taller men earn higher. wages than shorter men, so a random sample of 20 men 
who are six feet or taller is obtained. Their.(ordered) incomes in thousands of 
dollars are as follows: 


10.8, 12.7, 13.9, 18.1, 19.4, 21.3, 23.5, 24.0, 24.6, 25.0, 
25.4, 27.7, 30.1, 30.6, 32.3, 33.3, 34.7, 38.8, 40.3, 55.5 


To test Hy: m= 24,500 against H,:m > 24,500, we. compute T = 8 negative 
signs. The p-value for this test based on this statistic is B(8; 20, 0.5) = 0.2517. 
Thus we do not have strong evidence based on this statistic to reject Hy and 
support the claim that taller men have higher incomes. 
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Note that if any of the observed values are exactly mg, then these values should 
be discarded and the sample size reduced accordingly. Note also that the sign test 
would have been unaffected in this example if the workers had been unwilling to 
give their exact incomes but willing to indicate whether it was less than $24,500 
or more than $24,500. 


BINOMIAL TEST (TEST ON QUANTILES) 


Theorem 14.3.7 


Example 14.3.7 


Clearly a test for any quantile (or percentile) can be set up in the same manner as 
the sign test for medians. We. may wish to test the hypothesis that xq is the poth 
percentile of a distribution F(x), for some specified value po ; that is, we wish to 
test 


Hy Xp, = %o .- (Ho: PLX < Xo] = F(Xo) = Do) 
against 
Hy: Xp, > Xo (H, : F(%o) < Do) 


Let t = number of negative signs of (x; — x9) for i = 1, ..., n. Then when Hg is 
true, T ~ BIN(n, po), and this test is equivalent to a binomial test of Hy: p = po 
against H,: p < po, where po is the probability of a negative sign under Ho. 


Let X ~ F(x) and F(x,) =p. For a specified po, a size a test of Ho: Xp, = Xo 
against H,:x,, > Xo is to reject Ho if 


Blt; n, Po) < « 


where t is the number. of x, that are smaller than x, in a random sample of size n. 


The other one-sided and two-sided alternative tests may be carried out in a 
similar manner, using the binomial test described in Theorem 12.4.2. These tests 
could be modified to apply to a discrete random variable X if care is taken with 
the details involved. 


In the study of Example 14.2.1, we wish to establish that the 25th percentile for 
tall men is less than $24,500. That is, we test Ho :Xo.25 = $24,500 against 
H, :Xo0.25 < $24,500. This is equivalent to testing Ho : F(24,500) = 0.25 against 
H, : F(24,500) > 0.25. We find from the data that t = 8, and the corresponding p 
value is 


1 — B(t — 1, n, po) = 1 — B(7; 20, 0.25) = 0.102 
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An alternate expression for the test on quantiles can be given in terms of order 
statistics. The outcome T =t is equivalent to having Xp, < Xo <%X:+1.- We 
know that confidence intervals for a parameter can be associated with the values 
of the parameter for which rejection of Hy would not occur in a test of hypothe- 
sis. In developing distribution-free confidence intervals for a quantile, the 
common practice is to express these directly in terms of the order statistics. 


CONFIDENCE INTERVAL FOR A QUANTILE 


Consider a continuous random variable X ~ F(x) and let F(x,) = p. Let 
Z; — F(Xj.,). Then 


Xin» pair) Xan) = nif (X4.) we Aisin) (14.3.1) 
A(z,,..., Z,) =n! 0<2,<°::<2z,<1 (14.3.2) 
and 
——— n! k-1 n-k 
h{z,) = k—-Din—b! z (1 - z,) 0<z,<1 (14.3.3) 
Now : 
PU Xn < Xp] = PLF( Kien) < F(x,)] 
= PLZ, < pl 
Pp 
= { h,{z,) dz, (14.3.4) 
lo 


This integral represents an incomplete beta function, and for integer k it can be 
expressed in terms of the cumulative binomial distribution, where 


" {n 

P[Xin < Xp] = ¥ @z — pyri 
j=k \J 

= y(k, n, Pp) (14.3.5) 


Thus for a given pth percentile, the kth order statistic provides a lower >(k, n, p) 
level confidence limit for x,, where k and n can be chosen to achieve a particular 
desired level. Binomial tables can be used for small. and the normal approx- 
imation for larger n. 
In a similar fashion, 
PX kin 2 Xp] =1- PL Xin < Xp] 
= Bk — 1; n, p) (14.3.6) 


and a desired upper confidence limit can be obtained by proper choice of k and n. 


Example 74.3.2 
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A two-sided confidence interval for a given percentile x, also can be developed. 
Note that 
P[X in < Xp] = PLX itn < Xp < X jn] + P[X jun < Xp] (14.3.7) 
because either X;,,< x, or Xj, > Xp, SO 
PX isn < Xp < X jen] 7 PLX jn < Xp] as PLX jen < Xp] 
Again one would attempt to find combinations of i, j, and n to provide the 
desired confidence level. 


It can be shown that equation (14.3.8) provides a conservative confidence inter- 
val if F is discrete. 


We now wish to compute a confidence interval for the 25th percentile in the 
previous example. We note that 


PLX 4,20 < Xo.251 =1— Bi; 20, 0.25) = 0.9757 
and 
P(X j0:20 2 X0.25] = BQ; 20, 0.25) = 0.9861 
Thus, (%2.20 > X10:20) = (12.7, 25.0) is a two-sided confidence interval for x92; with 
confidence coefficient 1 — 0.0243 — 0.0139 = 0.9618. 
For large n, a normal approximation may be used. For example, for 
an upper limit, in equation (14.3.6) set B(k—1;n, p)=@(z), where 


z=(k—1+40.5 — np)/,/np(1 — p). For a specified level 1 — a, setting z = z,_, 
gives an approximate expression for k in terms of n, 


k=05 + np +2,-,./np(l — p) 


If k is rounded to the nearest integer, then x,,, is the approximate upper 1 — a 
confidence limit for x,. For the lower limit case, replace z,_, with z,. 


TOLERANCE LIMITS 


A function of the sample L(x) is said to be a lower y probability tolerance limit 
for proportion p* if 


af i F(x) dx > P| = P[1 ~ F(L(X) > p*] 


L(x) 
= PLF(L(X)) < 1 — p*] 
= P[LL(X) < x,_»] =} (14.3.9) 


That is, we wish to have an interval that will contain a prescribed proportion p* 
of the population. It is not possible to determine such an interval exactly if F is 
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unknown, but the tolerance interval (L(X), 00) will contain at least a proportion 
p* of the population with confidence level y. The proportion p* is referred to as 
the content of the tolerance interval. It is clear that a lower y probability toler- 
ance limit for proportion p* is simply a lower y level confidence limit for the 
1 — p* percentile, x, _,.». Thus L(X) = X,,, is a distribution-free lower y tolerance 
limit for proportion p* if k and n are chosen so that 


wk, n, 1 — p*)=1—-Bk-—1;n,1—-p*)=y (14.3.10) 
One may also wish to have a two-sided tolerance interval (L(X), U(X)) such that 
P{F[U(X)] — FLL(X)] > pj = 7 


A two-sided tolerance interval cannot be obtained from a two-sided confidence 
interval on a percentile, but a two-sided distribution-free tolerance interval can 
be obtained in the form (L(X), U(A)) = (Xin, Xj.,), by the proper choice of i, j, 
and n. We need to choose i, j, and n to satisfy ; 


PLF(X pa) — F(Xig) > P] = PLZ; — Zi > P= 


where f(z,,...,2,) =n!;0<2, <--++<2z,<1. 
A few comments will be made before determining the distribution of Z; — Z,. 
The content between two consecutive order statistics is known as a coverage, say 


W, = F(Xjn) = FX na) = Zi Zin (14.3.11) 


and E(W,) = 1/(n + 1): The expected content between two consecutive order sta- 
tistics is 1/™m+ 1). It follows that the expected content between any two order 
statisticsX,,, and X,.,,i</j, is given by 


j j-i 
HZ 2) = = 3. 
(Z; — Z,) oe m) et (14,3.12) 


That is, the expected content depends only on the difference j — i and does not 
depend on which i and j are involved. It turns out in general that the density of 
Z; — Z; or the sum of any j — i coverages depends only on j — i and not on i and 
j separately. 

Consider the transformation 


Wy = 2, W2 = 22 —-724,... Wr = Zn 2n-1 


with inverse transformation 
k 
z= >), W; k=1,2,...,n 
ist 
The Jacobian is 1, so 


ST (wy, 5 W,) =n! w,;>0 Swe<i (14.3.13) 
i=1 
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This density is symmetric with respect to the w,. That is, the density of any 
function of the w, will not depend on which of the w; are involved. In particular 
the density of the variable, 


j 
Uj; = F(X jn) — F(Xin) =Z;-Zi= LW (14.3.14) 
k=it1 


depends only on the number of coverages summed, j — i, and the density is the 
same as the density of the sum of the first j — i coverages, 


j-i 
pe W, = Z j-i 
k=1 
The marginal density of Z;_; is given by equation (14.3.3) with k =j — i, which 
is a beta density, so 
U;-, = Z,— Z,; ~ BETAG —i,n—j+i+)) (14.3.15) 
Expressed in terms of the binomial CDF, 


j-i-1i 
P[Z,-—Z;>p]= oy (7)ea — p)""* = BG —i—1);n, p) © (4.3.16) 


Thus the interval (X;,,, Xj,,) provides a two-sided y probability tolerance interval 
for proportion p if i andj are chosen to satisfy 


BG -i-1;", p=y7 


Theorem 14.3.2 For a continuous random variable X ~ F(x), L(X) = X;., is a lower y probability 


Exampie 74.3.3 


tolerance limit for proportion p*, where y = 1 — B(k — 1; n, 1 — p*). Also X;,, is 
an upper y probability tolerance limit for proportion p, where y = B(k — 1; n, p). 
The interval (X;,,,; X jn) is a two-sided y probability tolerance interval for propor- 
tion p, where y = B(j —i— 1; n, p). a 


In Example 14.3.2, we see that the lower confidence limit on x, = Xo.25, given by 
L(X) = X},29, also may be interpreted as a y = 0.9757 probability tolerance limit 
for proportion p* = 1 — 0.25 = 0.75. That is, we are 97.57% confident that at 
least 75% of the incomes of tall men in this profession will exceed x3,.9 = 12.7 


thousands of dollars. : 
If we were interested in a lower tolerance limit for proportion p* = 0.90, then 


Table 1 (Appendix C) at 1 — p* = 0.10 shows 
P[X 1.20 < Xo.10] = 1 — BO; 20, 0.10) = 0.8784 


Thus, for example, if a 95% tolerance limit is desired for proportion 0.90, a larger 
sample size is required. 
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We see that. (L(X), U(X) = (X 1.20, X20:20) provides.a two-sided tolerance 
interval for proportion 0.80 with probability level y = B(20 — 1 —-1; 20, 0.80) 
= 0.9308. 


For specified y and p it is of interest to know what sample size is required so 
that (X1.,, Xq:,) Will provide the desired tolerance interval. Setting 


Bin — 2; n, p)=y 
yields n as the solution to 
np”~* —(n— 1)p"=1—y 
For this n, we find PLF(X,.,) — F(X4,,) 2 pP] =. 


TWO-SAMPLE SIGN TEST 


Theorem 14.4.9 


The. one-sample. sign test_is modified easily for use in.a paired sample problem, 
and this approach can be used as an alternative to the paired sample t test when 
normality cannot be assumed. Assume that n independent pairs of observations 


(x;, yj, i= 1,..., n, are available, and let T equal.the number of times X; is less 
than Y,. In terms of the differences X; — Y,, we say that T = number of negative 
signs of X;— ¥,,i= 1, ..., n. Again we will assume that X and Y are continuous 


random variables so that PLX = Y] = 0, but if-an observed x; — y; = 0. because 
of roundoff or: other reasons, then that: outcome willbe discarded and the 
number of pairs reduced by. one. The sign test is sensitive to shifts in location, 
and it should be-useful in-detecting differences in means or-differences in medians, 
although strictly speaking it is.a test of whether the median of the differences is 


Zero. 


Suppose that X and Y are continuous random variables and n independent pairs 
of observation (x;, y,) are available. Consider 


Hy: P[IX < Y]=P[xX > Y]=4 (Hy : median (X — Y) = 0) 
against 
H,: P[X <Y]<P[X> Y] (H, : median (X — Y) > 0) 
Let t be the number of negative signs of (x; — y,), i = 1, ..., n. Then, under Ho, 
T ~ BIN(,, 1/2), 


and a size a test of Hy against H, is to reject Hy if B(t; n, 1/2) < «. 
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Example.74.4.1_A campaign manager wishes to measure the effectiveness of a certain politician’s 
speech. Eighteen people were selected at random and asked to rate the politician 
before and after the speech. Of these, 11 had a positive reaction, four had a 
negative reaction, and three had no reaction. To test Hy: no effect against 
H, : positive effect, we have t = 4 negative reactions and n = 15, and the p-value 
for the test is 


B(4; 15, 0.5) = 0.0592 


Thus there is statistical evidence at this error level that the speech was effective 
for the sampled population. 


14.5 


WILCOXON PAIRED-SAMPLE SIGNED-RANK TEST 


The two-sample sign test makes use only of the signs of the differences x; — y;. A 
test would be expected to be more powerful or more efficient if it also makes 
some use of the magnitude of the differences, which the Wilcoxon signed-rank 
test does. 

Let d,; = x;— y; for i=1,..., 2 denote the differences of the matched pairs; 
then rank the differences without regard to sign. That is, rank the | d;| according 
to magnitude, but keep track of the signs associated. with each one. Now replace 
the | d;] with their ranks, and let T be the sum of the ranks of the positive differ- 
ences. Again this test statistic will be sensitive to differences in location between 
the two populations. In the sign test the positive signs and negative signs were 
assumed to be equally likely to occur under H,- In this case, to determine a 
critical value for T, we need to assume that the positive signs and negative signs 
are equally likely to be assigned to the ranks under Hy. The signs will be equally 
likely if the joint density of x and y-is symmetric in the variables; that is, we 
could consider Hy: F(x;, y;) = F(y;, x). This corresponds to the distribution of 
the differences being symmetric about zero or Hg: F,(d;) = 1— Fp(—d). Note 
that the probability of a negative sign is F (0) = 1/2, and the median is mp = 0. 
Also, for a symmetric distribution, the mean and the median are the same. Thus, 
under the symmetry assumptions mentioned, this test may be considered a test 
for equality of means for the two populations. 

In general, the signed-rank test is considered a test of the equality of two popu- 
lations and has good power against the alternative of a difference in location, but 
the specific assumption under Ho is that any sequence of signs is equally likely to 
be associated with the ranked differences. That is, if the alternative is stated as 
H, : E(X) < E(Y), then rejection of H, could occur because of some other lack of 
symmetry. For a one-sided alternative, say H, : E(X) < E(Y), one would reject H, 
for small values of T, the sum of positive ranks of the differences d; = x; — y,. To 
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illustrate how a critical value for T may be computed, consider n = 8 pairs; then 
there are 2° = 256 equally likely possible sequences of pluses and minuses that 
can be associated with the eight ranks. If we are interested in small values of T, 
then we will order these 256 possible outcomes by putting the ones associated 
with the smallest values of T in the critical region. The outcomes associated with 
small values of T are illustrated in Table 14.1. 


Signs associated with the Wilcoxon paired-sample 
signed-rank test 


Ranks 


~ 
oO 


1 2 3 4 5 6 


Signs 


pee 
be be 
ee oo ee a 
perp ree 
rrrar aa 
Pack aye a 
bhaRwWwrnso | 4 


hehe Tes bob abe 


PRAT t+ ot 


Placing the first five.possible outcomes in the-critical: region corresponds to 
rejecting H, if -T <3; and this gives « = 5/256 =.0.0195. Rejecting H, if T <4 
resultsin a significance level of «:=:7/256 =.0.027, and.so on. 

Conservative critical values, t,, are-provided.in Table 12:(Appendix C) for the 
usual prescribed a. levels for.n < 20. The true-Type I-error. may be slightly less 
than « because of discreteness..A normal approximation is adequate for n > 20. 
The mean and variance of T may. be determined.as follows. 

Without. loss of generality, the subscripts. of the original differences can be 
rearranged so. that the absolute differences are in ascending order, |d,|<-+-:< 
|d,|,-in. which case. the rank of |d;| is i, and the ‘signed-rank statistic can be 
written as T= )\ iU; where U; = 1 if the difference whose absolute value has 

i= 
rank i is positive, and U; = 0 if it is negative. Under H,, the variables U,,..., U, 
are independent identically distributed Bernoulli variables, U; ~ BIN(1, 1/2). 


Thus, 


E(T) = a 3 iu) ST Uses 4 
i=1 i=1 2 id 
tran+))- nint+ DD 


20° ¥2 4 


Theorem 14.5.7 
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and 
y iu,) = > i? Var(U)) 


a i=1 


t 


Var(T) = Var( 


A gna +) Qn) 
tee 24 


Mas 


For large n, 
T—E(T) 4 


art) 


Leted; = x;—y,,i=1,..., n, denote the differences of n independent matched 
pairs. Rank the d; without regard to sign, and let 


Z~N(, 1) 


T = Sum of ranks associated with positive signed differences 
A (conservative) size a test of 
Ho : F(x;, yi) = F(y;, xj) (Ho : Fp(d) = 1 — Fp(—d)) 
against 
H,: X is stochastically smaller than Y (P[X > a] < PLY > a], all a) 


is to reject Hy if t < t, (where t, is given in Table 12). 


This test also may be used.as.a test of the hypothesis that f,(d) is symmetric 
with Up = E(X) — E(Y) = O against the alternative E(X) < E(Y). 
For a two-sided alternative, let <* be the smaller sum of like signed ranks; then 


- a (conservative) size a test is to reject Ho if t* < t,)2. 


For n = 20, approximately 


n(n + 1) 
eee 


n(n + 1)(2n + 1) 
24 


Note that the signed-rank test also can be used as a one-sample test for the 
median of a symmetric population. Consider the null hypothesis H, that X is a 
continuous random variable with a symmetric distribution about the median mg. 
Let d; = x;— my and T = sum of positive signed ranks as above. Then a size « 
test of Hy against the alternative H,:m<my is to reject Hy if t<t,. For 
H,:m> mo, let T = sum of ranks for the negative d;. 

If the differences actually are normally distributed, then a paired-sample t test 
is applicable for testing 


Hoi tp = E(X) — E(Y) = 0 against H,: E(X) — E(Y) <0 


T 
~ N (0, 1) 
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If the Wilcoxon signed-rank test is used in this case, its asymptotic relative effi- 
ciency is 3/x = 0.955. 

Also note that any observed d; = 0 should be discarded and the sample size 
reduced. If there is a tie in the value of two or more d,, then it is common 
practice to use the average of their ranks for all of the tied differences in the 


group. 


To illustrate the one-sample signed-rank test, consider again Example 14.2.1, 
where we wish to test the median income Hy : m = 24.5 thousand dollars against 
H,:m > 24.5. If we assume that the distribution of incomes is symmetric, then 
the signs of x; — mg are equally likely to be positive or negative, and we can 
apply the Wilcoxon paired-sample signed-rank test. The test in this case also will 
be a test of Hy: 4 = 24.5, because the mean and median are the same for sym- 
metric distributions. 

Note that if the assumption of symmetry is not valid, then Hy could be rejected 
even though m = mo, because lack of symmetry could cause the signs of x; — mo 
not to be equally likely to be positive or negative. Indeed, if the median m = my 
can be assumed known, then the Wilcoxon signed-rank test can be used as a test 
of symmetry. That is, we really are testing both that the distribution is symmetric 
and that it has median my. 

in our example we first determine the ranks of the d; according to their abso- 
lute value | d;| = |x;— mg|, as follows: 


d, 13.7. -11.8 --106 -6.4 -51 -32.-10 -05 01 085 
rank (|d;|) | 17 16 15 4 8 65 5 25 1 25 
d 09 32 56 £61 78 88 102 143 158 31.0 


i 


rank (|d,|) | 4 65 9 10 12 13 14 18 19 20 


For H,:m > 24.5, we reject Hy for a small sum of negative signed ranks, where 
for this set of data 


Tie 254/54 6.5 pi 8 


From Table 12 (Appendix C) we see that T = 81 gives a p value of 0.20, so we 
cannot reject Hy at the usual prescribed significance levels. However, this test 
does give some indication that the hypothesis that the incomes are symmetrically 
distributed about a median of $24,500 is false. The lack of symmetry may be the 
greater source for disagreement with H, in this example. 


14.5 WILCOXON PAIRED-SAMPLE SIGNED-RANK TEST 481 


Exampie 14.5.2 To illustrate the paired-sample case, suppose that in the previous example a 
second sample, y;,..., ¥2o, of 20 men under six feet tall is available, and we wish 
to see if there is statistical evidence that the median income of tall men is greater 
than the median income of shorter men, Ho : Xo.50 = Yo.s0 against H,:Xo.59 > 
Yo.so. In this two-sample case, if the medians are equal, then there would be little 
reason to suspect the assumption of symmetry. Note that if the two samples are 
independent, then the paired sample test still can be applied, but a more powerful 
independent samples test will be discussed later. If the samples are paired in a 
meaningful way, then the paired-sample test may be preferable to an independent 
samples test: For example, the pairs could be of short and tall men who have the 
same level: of education or the same age. Of course, the samples would not be 
independent in that case. 

Consider. the following .20: observations, where it is assumed that the first 
observation was paired with the first (ordered) observation in the first sample, 
and so on. The differences d; = x; — y; also are recorded. 


13.9181 194 21.3235. 240 246 25.0 


10.7.-.49.2 . 18.0....20.1 20.0..°621.2 21.3 25.5 


3.2 24 1.4 1.2 3.5 2.8 3.3 -0.5 


12 5 8 6 14 10 13 3 


27.7. 30.1 30.6. 32.3. .33.3...34.7°. 388 403 55.5 


26.4. 24.5. °27.5 25.0. 28.0. 37.4. 43.8 358 60.9 


13°56 31 #73 53..-27. -50 45 -5A4 


rank ({d,{) : 11 20 «+17 9 16 15 18 


For the alternative hypothesis as stated, we reject Hy if the sum of ranks of the 
negative differences is small. An alternative approach would have been to relabel 
or to let d; = y; — x;; then we would have used the sum of positive signed ranks. 
Note also that T* + T~ = n(n+ 1)/2, which is useful for computing the smaller 
sum of like-signed ranks for a ‘two-sided alternative. We have T =1 
+54342494 164 18 = 54. Because to; = 60, according to this set of 
data we can reject Hy at the 0.05 level. 

The approximate large sample 0.05 critical value for this case is given by 


(2021K41)  20(21) 
to.os = 20.05 ages + aa = 60.9 
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PAIRED-SAMPLE RANDOMIZATION TEST 


TABLE 14.2 


In the Wilcoxon signed-rank test, the actual observations were replaced by ranks. 
The advantage of this is that predetermined critical values can be tabulated as a 
function only of n and a. It is possible to retain the actual observations and 
develop a probability structure for testing by assuming that all ordered outcomes 
of data are equally likely under Hy. For example, if two samples are selected 
from the same population, then the assignment of which came from population { 
and which from population 2 could be made at: random. There would be N = 
(n, + n2)!/ny!nz! equally likely possible assignments to the given set of data. 
Thus:a size « = k/N size test of equality can be obtained by choosing k of these 
outcomes to be included in our critical region. Of course, we want to pick the k 
outcomes that are most likely to. occur when the alternative hypothesis is true. 
Thus, we need some test statistic, T, that will identify what order we want to use 
in putting the possible outcomes into the critical region, and we need to know the 
critical value for T for a given-a. In the signed-rank test, we used the sum of the 
positive signed ranks, and we were able to tabulate critical values. We now may 
use. the sum of the positive differences as our test statistic, although we cannot 
determine the proper critical value until we know all the values of the d;. For 
example, the following eight differences are observed in a paired sample problem: 


+20, —-10, +8, -7, +5, —4,-4+2,; —1 


There are 2* = 256 possible ways of assigning pluses and minuses to eight 
numbers, and each outcome is equally likely under the hypothesis that the D, are 
symmetrically distributed about mp = 0. We may rank these possible outcomes 
according to T, the sum of the positive d; ; the first dozen outcomes are shown in 
Table 14.2 


Differences d, for the paired-sample 
randomization test 


-20. -10 -8 -7. -5 -4 -2  =-1 
-20 -10 -8 -7 -§ -4 -2 41 
-20 -10 -8 -7) =5  =4 0 42 +1 
=20. 10-8. HF Bh HQ Ht 
-20. -10 -8 -7. -5 +4 -2. -1 
-20 -10 -8 -7 -5 +4 -2 41 
-20 -10 -8 -7 45-4 -2 0-1 
=20.0 10. BT Bh 2H 
-20 -10 -8 -7 -5 +4 +2 -1 
-20 -10 -8 -7 -5 +4 +42 41 
-20 -10  -8 -7 +5 -4 +42. -1 
-20 -10 -8 +7  -8 -4 -2 -1 
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Thus to test Hy:mp =O (and symmetry) against a one-sided alternative 
H,: Mp < is to reject Hy for small T. Given these eight numerical values, a size 
a = 12/256 = 0.047 test is to reject Hy if T <7. Thus we can reject Hy at this « 
for the data as presented. That is, there were only 12 cases as extreme as the one 
observed (using the statistic T). 

Tests such as the one described based on the actual observations have high 
efficiency, but the test is much more convenient if the observations are replaced 
by ranks so that fixed critical values can be tabulated. A normal approximation 
can be used for larger -n, and quite generally the normal theory test procedures 
can be considered as approximations to the corresponding “exact” randomiza- 
tion tests. For each normal test described, the same test statistic can be used to 
order the set of possible outcomes produced under the randomization concept 
that gives equally likely outcomes under Hy. Approximating the distribution of 
the statistic then returns us to'a normal type test. For small n, exact critical 

- values can be computed as described, but these are in general quite inconvenient 
to determine. 


14.7 


WILCOXON AND MANN-WHITNEY (WMW) TESTS 


We now will consider a nonparametric analog to the t test for independent 
samples. The Wilcoxon rank sum test is designed to be sensitive to differences in 
location, but strictly speaking it is'a test of the equality of two distributions. To 
illustrate the case of a one-sided alternative, consider Hy: Fy = Fy against 

_ Hy: Fy > Fy (X stochastically smaller than Y). Suppose that n, observations 
Xy,.++)X,, and nz observations y,,...,Y,, are available from the two popu- 
lations. Combine these two samples and then rank the combined samples in 
ascending order. Under H,.any arrangement of the x’s and y’s is equally likely to 
occur, and there are N = (n, +.n,)!/n,!n ! possible arrangements. Again we can 
produce a size « = k/N test by choosing k of the possible arrangements to include 
in a critical region. 

We wish to select arrangements to go into the critical region that are likely to 
occur when H, is true to minimize our Type II error. The Wilcoxon test says to 
replace the observations with their combined sample ranks, and then reject Hy if 
the sum of the ranks of the x’s is small. That is, the order of preference for 
including an arrangement in the critical region is based on W, = )° rank (x’s). 

;) = 126 possible arrange- 

ments; the ones with the smallest values of W, are shown in Table 14.3. 

A size « = 7/126 = 0.056 test is achieved by rejecting Hy if the observed 

w, < 13. Note that W. + W, = (ny + n\n, +n, + 1/2. 


For example, ifn, = 4 and n, =5, then there are ( 
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Arrangements of x’s and y’s for the 
Wilcoxon-—Mann-Whitney tests 


1 2 3 4 5 § 7 8 9 Ww, U, 
x x x x y y y y y 10 0 
x x x y x y y y y 11 1 
x x y x x y y. y. y 12 2 
x x x y y x y y y 12 2 
x y x x x y y y y 13 3 
x x y x y x y y y 13 3 
x x x Y y y x y y 13 3 


Mann and Whitney suggested using the statistic 
U,, = Number of times.a y precedes an x 


It turns out that U,. and W, are equivalent statistics. The minimum value of W, 
is n,(n, + 1)/2, and this corresponds to U,, = 0. if one y precedes one x, then 
U,, = 1 and this increases W, by 1. Similarly, each time a y precedes an x, this 
increases W. by one more so that 


W, = min, + 1) U. 
2 
Sunilarly, 
n(n, + 1) 
W, = 2. + U, 


where U, is the number of times an x precedes a y. Note that U, + U, = nny. 

For the alternative H, : (X stochastically larger than Y), we would reject Hy if 
W, is large or if W, and U, are small. The seven sequences corresponding to the 
smallest values of W, in the example are the seven sequences in Table 14.3 ranked 
in the reverse order. In this case W, = 18 for the last sequence, for example, and 
U, = 18 — [5(6)/2] = 3 = U,, for the original table. Indeed, for a given sequence, 
U, computed under the first order of ranking is the same as U, for the reverse 
ranking, because the same number of interchanges between the x’s and y’s occurs 
whichever direction the ranks are applied to the sequences. Thus, U, and U, are 
identically distributed and the same critical values can be used with either U,, or 
U,. Sometimes the subscript will be ‘suppressed and the notation U will be used. 
The notations u, and u, will refer to the observed values of U, and U,, respec- 
tively, and u,is the notation for a 100«th percentile of U (where U ~ U, ~ U)). 

Table 13A (Appendix C) gives P[U, <u] = P[U, <u] for values of m = min 
(ny, n,) and n= max (n,, n,) less than or equal to 8. Table 13B (Appendix C) 
gives critical values u, such:that P[U <u,] <a for 9<n<i4. A normal 
approximation may be used for larger sample sizes. 
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Let. X1,..., X,, and. y,,..., Y,, be independent random samples. Then for an 
observed value of U,, reject Ho: Fy = Fy in favor of H,: Fy > Fy (X stochasti- 
cally smaller than Y) if PLU <u,] <a, or if U, <u,. Reject Hy in favor of 
H,: Fy < Fy (X stochastically larger than Y) if P[U <u,] <q, or if U, <y,. 
Reject H, in favor of a two-sided alternative H,: Fy # Fy if P[U <u,] <a/2 or 
P[U <u,] <a/2. Alternately, reject H) against the two-sided alternative if 
min (U,, U,) = min (U,,, nyn, — U,) < u,. 


A normal approximation for larger sample sizes may be determined as follows. 
Let 


0 X;< Y; 
Zi = a ae 14.7.1 
y , X; > Y,; ( ) 
Then 
my nz 
U= Zij (14.7.2) 
i=i j=t 
Under Ho, 
E(Z,) =1> P[Z,=1] =4 (14.7.3) 
and 
E(U) = mm (14.7.4) 


The expected values of products of the Z;; are required to determine the 
variance of U. For example, if j # k, then 


E(Zj Zi) aie ee PZ; = 1, Z, = 1) 
=P[X,>¥j;X,>¥J] 


2 1 
pres (14.7.5) 


By 


There are two ways to have a success in the 3! arrangements of X,, Y,, and ¥,. It 
can be shown (see Exercise 26) that 


n,n(n, +n, + 1) 
12 


Thus the normal approximation for the « level critical value is 


us a = Zy-g/nyn2(n, +n, + 1/12 (14.7.7) 


It is possible to express the exact distribution of U recursively, but that will not 
be considered here. If ties occur and their number is not ‘excessive, then it is 


Var(U) = (14.7.6) 
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common practice to assign the average of the ranks to each tied observation. 
Other adjustments also have been studied for the case of ties. The asymptotic 
relative efficiency of the WMW test compared to the usual t test under normal 
assumptions is 3/2 = 0.955. 


The times to failure of airplane air conditioners for two different airplanes were 
recorded as follows: 


x 23 261. 87 7.120 .14. 62 ...47 225 71. 246 .21 


y 55 320 56 104 220 239 47 246 176 182 33 


We wish to test Hy: Fy = Fy against H,: X is stochastically smaller than Y. 
This alternative could be interpreted as H,: uy < uy, if it is assumed that the 
distributions are otherwise the same. Associating ranks with the combined 
samples gives the following results. 


x x x x y y x y y x x x 
7 14 21 23 33047 47 55 8656 62 71 87 
A 2 3 4 5 6.5 6.5 8 9 10 it 12 

4 x ¥Y y y x y x y x y 
104 120 176 182 220 225 239 246 246 261 320 
13 14 15 16 17 18 19 20.5 20.5 22 23 


We have n, = 12, n, = 11, and the sum of the ranks of the x’s is 
W.=14243444654:::=124 


and U, = W, — n(n, + 1)/2 = 124 — 78 = 46. 

For the given alternative, we wish to reject Hy if W, or U,, is small. From Table 
13B (Appendix C), the « = 0.10 critical value is up,19 = 44, so we cannot reject H, 
at the « = 0.10 significance level. 

To illustrate the asymptotic normal approximation, E(U) = 66, Var(U) = 264, 
and the approximate p-value for this test is 


46 — 66 
e ) = ®(—1.23) = 0.1093 


PLU < 46] =©@ 
: : (“a 


CORRELATION TESTS—TESTS OF INDEPENDENCE 


Suppose that we have n pairs of observations (x;, y,) from a continuous bivariate 
distribution function F(x, y) with continuous marginal distributions F,(x) and 
F,(y). We wish to test for independence of X and Y, Hy: F(x, y) = F ix) F2(y) 
against, say, the alternative of a positive correlation. 
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For a given set of observations there are n! possible pairings, which are all 
equally likely under Hy that X and Y are independent. For example, we may 
consider a fixed ordering of the y’s; then there are n! permutations of the x’s that 
can be paired with the y’s. Let us consider a test that is based on a measure of 
relationship known as the sample correlation coefficient, 


jess 3 X; Yi — XY 
VOX x? — nx71LY y? — ny?) 


That is, for a size « = k/n! level test of Ho against the alternative of a positive 
correlation, we will compute r for each of the n! possible permutations, and then 
place the k permutations with the largest values of r in the critical region. If the 
observed ordering in our sample is one of these permutations, then we reject Ho. 
Note that x, y, s,, and s, do not change under permutations of the observations, 
so we may equivalently consider 

t=)) xy; 
as our test statistic. 

Again it becomes too tedious for large n to compute ¢ for all n! permutations 
to determine the critical value for T. We may use a normal approximation for 
large n, and for smaller n we may again consider replacing the observations with 
their ranks so that fixed critical values can be computed and tabulated once and 
for all. 


NORMAL APPROXIMATION 


For fixed y,, let us consider the moments of the x; relative to the n! equally likely 
permutations. The notation is somewhat ambiguous, but suppose we let X, 
' denote a random variable that takes on the n values x,,..., x,, each with prob- 
ability 1/n. Similarly, the variable X,X, will take on the n(n — 1) values x, x; for 
i # j, each with probability 1/(n)(n — 1), and so on. Now 


Var(X;) = E(X?) — x? = 3. x? -_ x? = (n — 1)s2/n 


and 
Cole Vie hx Xj —2 = ae ee 
n(n — 1) 45 
Now 
E(T) = EQ! Xiy) = ¥ vi E(X) =D yi = ky 
and 


E(r) =0 
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Because the correlation coefficient is invariant over shifts in location, for conve- 
nience we will. assume temporarily that the. x’s and y’s are shifted so that 
xX = y= 0; then 


Var(S, Xiy) =D) yf Varl3) + EY. yyy Cons, XD 


= y y?(n — 1)s2/n + 2» Wi; yy x; x,/n(n — 1) 


—i 2 
= MVS 4D yt LD yA wy? — DFU — 9 
a dice L*syss i (n — 1)s*s? 
n n 


= (n — 1)s?s? 


Thus 


i i 
Var(t) = 3-5-5 Digest Var), X;y) = oy 
xny 


(n 


These: moments were calculated conditionally, given fixed values of (x;, y), but 
because the results do not depend on (x;, y,, the moments are also true uncondi- 


tionally. 
It can be shown that a good large sample approximation is given by 


i 
cad N 2 ee 
r (0 qo ) 
It is interesting that for large n, approximately, 


E(@3)=0 and = E(r 


a4 


These four moments are precisely the first four moments of the exact distribution 
of r based on random samples from a bivariate normal distribution. Thus a very 
close approximation for the “permutation” distribution of r, which is quite accu- 
rate even for small n, is obtained by using the exact distribution of r under 
normal theory. We will find in the next chapter that under the hypothesis of 
independence, the sample correlation coefficient can be transformed into a sta- 
tistic that is ¢ distributed. In particular, 


2r 
phe po t(n re 2) 
f/1—r? 
Basically, the preceding results suggest that the test for independence devel- 
oped under normal theory is very robust in this case, and one does not need to 
worry much about the validity of the normal assumptions for moderate sample 
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sizes. If one wishes to determine an exact nonparametric test for very small n, say 
5 or 10, then it is again convenient to make use of ranks. The rank correlation 
coefficient also may be useful for testing randomness. 


Consider again the paired-sample data given in Example 14.5.2. The correlation 
coefficient for that set of paired data is r = 0.96, It is clear that the pairing was 
effective in this case and that the samples are highly correlated without per- 
forming tests of independence. The approximate t statistic in this case is 


t = 0.96,/18/./1 — 0.96? = 14.8. 


Again, it appears safe to use Student’s ¢ distribution based on normal theory 
unless nv is very small. 


SPEARMAN’S RANK CORRELATION. COEFFICIENT 


Again consider n pairs of observations; this time, however, the pairs already are 
ordered according to the y,. Thus the pairs wili-be denoted by (X;, y;.,),i = 1,...,n, 
where the y,,, are the fixed ordered y observations, and x; denotes the x value 
paired with the ith largest y value. We will replace the observed values with 
ranks. Let W, = rank(y,.,) = i, and let U; = rank(x,) denote the rank of the x value 
that is paired with the ith largest. y value. The sample correlation. coefficient 
based on these ranks is referred to as Spearman’s rank:correlation coefficient, R,. 
It may be conveniently expressed in terms of the difference of the ranks, 


d;=U,;,—i 


We have 


W=U=(n+1)/2 
(n — 1)s2, = (n — 1)sh = Yi? — nU? 


—_rn+12Qn+1)  n(n+1) 
7 6 ~ 4 


__ n(n? — 1) 
12 
and 
TF =Y UV, 9 = UP-2D + DP 
=2) 7? -2> iv, 
_ n(n + 1)(2n + 1) 
3 


2 > iU; 
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so 
_— (2—VNswu — YU, — n(n + 17/4 
22 — Dsysy nn = 1/12 
_,_ 6oa 
“n(n? — 1) 
If there is total agreement in the rankings, then each d; = 0 and R, = 1. In the 
case of perfect disagreement, dj, =n—i+1—i=n—2i+ 1, and R, = —1 as it 


should. Of course, one would reject the hypothesis of independence in favor of a 
positive correlation alternative for large values of R,. Alternatively, one could 
compute the p value of the test and reject H, if the p value is less than or equal to 
a. Note also that the distribution of R, is symmetric, so Table 14 (Appendix C) 
gives the p values 


p= P[R, < —r] = PLR, > r] 


for possible observed values r or —r of R, for n < 10, For n > 10, approximate 
p-values or approximate. critical values may be obtained using Student’s ¢ dis- 
tribution approximation, 


./n—2R 
Pa an 2) 


1—R?2 


For an observed value of R; = 7,, a size « test-of Hy: F(x, y) = F,(x)F,(y) against 

H, : “positive correlation” is io reject Hy: if p = p[R, > rs]. <a, or approximately 
if t = /n — 2r,/./1 — 1? > t,;,(n — 2). For the alternative H, : “negative corre- 
lation,” reject Hy if p = P[R, <r,] <, or approximately if t < —t,_,(n — 2). 


The ARE of R, compared to R under normal assumptions is (3/z)? = 0.91. 


Now we will compute Spearman’s rank correlation coefficient for the paired data 
considered in Examples 14.5.2 and 14.8.1. Replacing the observations with their 
ranks gives the following results: 


Rank (x;,) 1 2 3 4 5 6 7 8 9 10 
Rank (y,) 1 3 2 5 4 7 6 8 9 12 
d; 0 1 -1 1 -4 1 -1 0 0 2 


U 


Rank (x,) 11 12 13 14 15 16 17 18 19 20 


Rank (y;,) 13 14 10 15 11 16 18 19 17 20 


d, 2 2 3 1 ~4 0 1 1 2 0 


i 
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Any convenient procedure for computing the sample correlation coefficient 
may be applied to the ranks to obtain Spearman’s rank correlation coefficient, 
but because the x; already were ordered, it is convenient to use the simplified 
formula based on the d;, which gives 


= 6d dpe 6(50) 3 
.~ nn? — DT 20Q07 =H 7278 


TEST OF RANDOMNESS 


Consider n observations x,,.., x,. The order of this sequence of observations 
may be determined by some other variable, Y, such as time. We may ask whether 
these observations represent a random sample or whether there is some sort of 
trend associated with the order of the observations. A test of randomness against 
a trend alternative is accomplished by a test of independence of X and Y. The Y 
variable usually is not a random variable but is a labeling, such as a fixed 
sequence of times at which the observations are taken. In terms of ranks, the 
subscripts of the x’s are the ranks of the y variable, and Spearman’s rank corre- 
lation coefficient is computed as described earlier. A test of Hy: F ,(x)=°« = F,,(x) 
against a one-sided alternative of the type H, : F(x) > F(x) --: > F,(x), for all x, 
would be carried out by rejecting Hy for large.values of R,. This alternative 
represents an upward trend alternative. 

Under normal assumptions, a particular type of trend alternative is one in 
which:the mean of the variable-x; is-a linear function.of i, 


X;~N(Bo + Ari, 0”) 


In this framework, a test of Hy :2,.=0.corresponds to a test of randomness. The 
usual likelihood ratio. test for this case is UMP for one-sided alternatives, and it 


i=1,00n 


turns out that the ARE of the: nonparametric test based on .R, compared to the 
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likelihood ratio test is (3/x)*/? = 0.98 when the normality assumptions hold. 
There are, of course, other types of nonrandomness besides upward or down- 

ward trends. For example, there could be a cyclic effect. Various tests have been 

developed: based on runs, and one of these is discussed in the next section. 


In Example 14.7.1 the lifetimes between successive repairs of airplane air condi- 
tioners were considered as a random sample. If the air conditioners were not 
restored to like-new conditions, one might suspect a downward trend in the life- 
times. The lifetimes from the first plane and their order of occurrences are shown 
below: 


i 1 2 3 4 5 6 7 8 9 10 41 12 
x; 23-261 87 7 120 14 62 47-225 71 246 21 
Rank (x,) 4 12 8 1 9 2 6 5 10 7 11 3 
d,; 3 10 5 -3 4 -4 -1 -3 1 ~3 Oo 6-9 


i 
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We find 

6(276) 

Rk, =1-—— 
2 12(12? — 1) 

Because R, > 0, there is certainly no evidence of a downward trend. If we con- 

sider a two-sided alternative, for « = 0.10, then we find co.95 = 0.497 and there is 

still no evidence to reject randomness. 


= 0.035 


WALD-WOLFOWITZ RUNS TEST 


Consider a sequence of observations listed in order of occurrence, which we wish 
to. test for randomness. Suppose that the observations can be reduced to two 
types, say a.and b. Let Tbe the total number. of runs of like elements in the 
sequence. For example, the following numbers were obtained from a “random 
number generator” on a computer. 


0.1, 0.4, 0,2, 0.8, 0.6, 0.9, 0.3, 0.4, 0.1, 0.2 


Let a denote a number less than 0.5 and b denote a number greater than 0.5, 
which gives the sequence 


aaabbbaaaa 


For this sequence, T = 3. A very small value of T ‘suggests nonrandomness, and 
a very large value of T also may suggest nonrandomness because of a cyclic 
effect. 

In this application the number of a’s, say A, is a random variable, but given the 
number of a’s and b’s, A and B, there are N =(A + B)!/A!B! equally likely per- 
mutations of the A +.B elements under H,. Thus the permutations associated 
with very small values of T or very large values of T are placed in the critical 
region. Again, for a specified value of «=k/N, it is necessary to know what criti- 
cal values for T will result in k permutations being included in the critical region. 
It is possible to work out the probability distribution analytically for the number 
of runs under Hy. 

Given A and B, the conditional probability distribution of the number of runs 


under Ho is 
ec) 
r/2—1/\r/2:-—1 ee 
A+B : 
oy 
Foo tarsi far 
(— D/2 ee (r — 1)/2 Angad 
0") 


P[T =r]= 
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For example, for even r there are exactly r/2 runs of a’s and r/2 runs of b’s. The 
sequence may start with either an a sequence or a b sequence, hence the factor 2. 
Now suppose that the sequence starts with an a. The number of ways of having 
r/2 runs of A a’s is the number of ways of putting r/2 — 1.slashes into the A — 1 


A-1 
. Similarly, 
(12) - i) Similarly, the number of ways of 


dividing the B b’s into r/2 runs is ( ay i: i) which gives ( ae - il ie te 


for the total number of ways of having r runs starting with a. The number of runs 
starting with b would be the same, and this leads to the first equation. 

The odd case would be similar, except that if the sequence begins and ends 
with an a, then there are (r.+:1)/2 runs of.a’s and (r-— 1)/2 runs of b’s. In this case, 
the number of ways of placing [(r + 1/2] —1=(r — 1)/2. slashes in the A — 1 


spaces between the a’s, which is ( 


A-1 
spaces is (4 1) _ and the number of ways. of placing [(r — 1/2] —1 


= (r — 3/2 slashes in the B — 1 spaces is Cw 


The total number of ways: of having -r runs: beginning and ending with b is 


B-1 SS 
(r — 1)/2/\(r — 3/27 
In the above example A = 7, B = 3,r = 3, and 
PLT <3] =P(T =2]4+ P[T =3] 


Alo) , Oe) +) 


Thus for a one-sided alternative associated with small T, one could reject Hy in 
this example at the « = 0.083 level. 

Tabulated critical values for this test are available in the literature (see, for 
example, Walpole and Myers, 1985, Table A.18), as are large-sample normal 
approximations. . 

The runs test is applicable to testing equality of distributions in two sample 
problems by ranking the combined samples of x’s and y’s, and then counting the 
number of runs. The runs test is not as powerful as the Wilcoxon—Mann- Whitney 
test in this case. 

It can be shown that 


2AB 
A+B 


E(T) = 


494 


Example 74.9.7 


CHAPTER 14 NONPARAMETRIC METHODS 


and 
2AB(2AB — A — B) 


(A + B)(A +B-1) 


and for A and B greater than 10 or so the normal approximation is adequate, 


where 
t, = E(T) + z,./Var(T) 


We wish to apply the runs test for randomness in Example 14.8.3. The median of 
the x; is, say, (62 + 71)/2 = 66.5, and we obtain the following sequence of a’s and 
b's: 


Var(T) = 


abbabaaabbba 
We have r=7, A=B=6, and P[T<7]=061, P(T=7)=0.22, and 
PLT >.7].=.0.61, so. as before we have no evidence atall of nonrandomness. 
In this example #(T) = 7, Var(T) = 2.72, and the normal approximation with 
correction for discontinuity gives 


PIT <75]= ofS = :) = 0.62 


Jf 2.72 


SUMMARY 


Our purpose in this chapter was to develop tests of hypotheses, and in some cases 
confidence intervals, that do not require parametric assumptions about the 
model. In many cases, only nominal (categorized) data or ordinal (ranked) data 
are required, rather than numerical (interval) data. 

The one-sample sign test can be used to test a hypothesis about the median of 
a continuous distribution, using binomial tables. In the case of a normal distribu- 
tion, this would provide an alternative to tests based on parametric assumptions 
such as the t test. However, the sign. test is less powerful than the ¢ test. A similar 
test can be used to test hypotheses about a percentile of a continuous distribu- 
tion. Nonparametric confidence intervals also are possible, and this is related to 
the problem of nonparametric tolerance limits, which also can be derived. 

It is also possible, by means of thé two-sample sign test, to test for a difference 
in location of two continuous distributions, However, as one might suspect, if it is 
applied to test the difference of normal means, then the power is not as high as it 
would be with a two-sample f test. The power situation is somewhat better for a 
test based on ranks, such as the Wilcoxon—Mann- Whitney tests. 

Tt is also possible to test nonparametrically for independence. One possibility is 
to adapt the usual sample correlation coefficient by applying it to the ranks 
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rather than to the values of the variables. This yields the Spearman’s rank corre- 
lation coefficient. 

Another question that arises frequently concerns whether the order of a 
sequence of observations has occurred at random or whether it was affected by 
some sort of trend associated with the order. A nonparametric test of correlation, 
such as the Spearman test, can be used in this situation, but another common 
choice is the Wald-Wolfowitz runs test. As noted earlier, this test can be used to 
test equality of two distributions, but it is not as powerful as the Wilcoxon— 
Mann-Whitney test in this application. 


. EXERCISES 


The following 20 observations are obtained from a random number generator. 
0.48, 0.10, 0.29, 0.31, 0.86, 0.91, 0.81, 0.92, 0.27, 0.21, 
0.31, 0.39, 0.39, 0.47, 0.84, 0.81, 0.97, 0.51, 0.59, 0.70 


(a) Test Ho::m.= 0.5. against -H,:m>.0.5-at a= 0.10. 
(b) Test Hy :m =0.25 against H,:m > 0.25 at a = 0.10. 


The median U.S. family income in 1983 was $24,580.00. The following 20 family incomes 
were observed in a random sample-from a certain city. 


23,470, 48,160, 15,350, 13,670, . 5,850; 20,130, 25,570, 
20,410, 30,700, 19,340, 26,370, 25,630, 18,920, 21,310, 
4,910, 24,840, 17,880,.27,620,. 21,660, 12,110 


' For the median city family income, m, test Hyp :m = 24,800 against H,:m < 24,800 at 


a= 0.10. 


The median number of hours of weekly TV viewing for children ages 6-11 in 1983 was 25 
hours. In an honors class of 50 students, 22 students watched TV more than 25 hours per 
week and 28 students watched TV less than 25 hours per week. For this class, test 

Hy :m = 25 against H,.:m < 25, at a = 0.05. 


For the data in Exercise 2, test the hypothesis that 10% of the families make less than 
$16,000 per year against the alternative that the tenth percentile is less than $16,000. 


Using the first bus motor failure data in Exercise 15 of Chapter 13, test Hy : X9.25 = 40,000 
miles against H, : X9..5 > 40,000 miles at a = 0.01. 


Use the data in Exercise 2. 
(a) What level lower confidence limit for xo,, can be obtained using x3,,,? 
(b) Obtain an upper confidence limit for x9.45. 
(c) Obtain an approximate 95% lower confidence limit on the median family income. 
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7... Consider. the data in Exercise 24 of Chapter 4, 


{a) Test Hy: Xo.59 = 5.20 against H,:Xo.59 > 5.20. at « = 0.05. Use the normal 
approximation 


x +05 — “2) 


B(x; np) of 
VV npg 
(b) Find an approximate 90% two-sided confidence interval on the median weight. 
(c) Find an approximate 95% lower confidence limit on the 25th percentile, x9 5. 


8. Consider the data in Exercise 24 of Chapter 4. 
(a) Set a 95% lower tolerance limit for proportion 0.60. 
(b) Set a 90% two-sided tolerance interval for proportion 0.60. 


9. Repeat Exercise 8 for the data in Example 4.6.3. 


70. Consider the data in Exercise 24 of Chapter 4. Determine an interval such that one could 
expect 94.5% of the weights of such major league baseballs to fall. 


77. .Ten brand:A tires-and 10 brand B tires were selected at random, and one brand A tire and 
one brand B tire were placed on the back wheels of each of 10 cars. The following 
distances to wearout in thousands of miles were recorded: 


Car 
1 2 3 4 5 6 7 8 3 10 


A 23 20 26 25 48 26 25 24 16 20 
8B 20 30 16 33 23 24 8 21 13 18 


(a) Assume that the differences are normally distributed, and use a paired-sample ¢ test 
to test Hy i UW, = My against H,: uw, > fp ata= 0.10. 
(b) Rework (a) using the two-sample sign test. 


712. . Suppose that 20 people are selected at random and asked to. compare soda drink A 
against soda drink B. If 15 prefer A over B, then test the hypothesis that soda B is 
preferable against the alternative that more people prefer brand A at « = 0.05. 


73. Twelve pairs of twin male lambs were selected; diet plan I was given to one twin and diet 
plan II to the other twin in each case. The weights at eight months were as follows. 


Diet I: 111 102 90 110 108 125 99° 121 133 115 90 101 
Diet I: 97 90 96 95° 110 107 85 104 119 98 97 104 


(a) Use the sign test to test the hypothesis that there is no difference in the diets against 
the alternative that diet I is preferable to diet II at « = 0.10. 

(b) Repeat (a) using the Wilcoxon paired-sample signed-rank test. Because n is only 12, 
use the table, but also work using the large-sample normal results for illustration 
purposes. 


14. 


15. 


76. 


17. 


78. 
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Siegel (1956, page 85) gives the following data on the number of nonsense syllables 
remembered under shock and nonshock conditions for 15 subjects: 


Subject 1 £2 3 4 


Nonshock 5 
Shock 2 


Test Ho that there is no difference in retention under the two conditions against the 
alternative that more syllables are remembered under nonshock conditions at « = 0.10. 
Use the Wilcoxon paired-sample signed-rank test. 


Davis (1952) gives the lifetimes in hours of 40-watt incandescent lamps from a forced-life 
test on lamps produced in the two indicated weeks: : 


{-2-47 1067 919 1196 785 1126. 936.918 1156 920 948 


10~-2-47 4105:°1243 1204. 1203. 1310-1262. 1234 1104-1303 1185 


Test the hypothesis that the manufacturing process is unchanged for the two different 
periods at a = 0.10. 
(a) Work out-both the small-sample and iarge-sampie tests based on U. 
(b) Although these are not paired samples, work out the test based on the Wilcoxon 
paired-sample test. 


The following fatigue failure times of ball bearings were obtained from two different 
testers. Test the hypothesis that there is no difference in testers, Hy : F(x) = F(y), against 
A: F(x) # F(y). (Use a = 0.10). 


Tester 1 140.3 158.0 183.9 132.7 117.8 98.7 164.8 136.6 93.4 116.6 


Tester 2 193.0. 172.5. 173.3 .204.7..172.0 .152.7..234.9..216.5 422.6 


In Exercise 11, test the hypothesis that the brand A and brand B samples are independent 
ata = 0.10. 
(a) Use Pearson’s r with the approximate t distribution. 
(b) Use Spearman’s R,. Compare the small-sample and large-sample approximations in 
this case. 


Consider the data in Exercise 13. 

(a) Estimate the correlation between the responses (x;, y,) on the twin lambs. 

(b) Test Hy : F(x, y) = F,(x)F.{y) against the alternative of a positive correlation at 
level « = 0.05, using Pearson’s r. Compare the asymptotic normal approximation 
with the approximate t result in this case. 

(c) Repeat the test in (b) using R,. 
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In a pig-judging contest, an official judge and a 4-H Club member each ranked 10 pigs as 
follows (Dixon and Massey, 1957, page 303): 


4-H Member 7 6 4 9 2 3 8 5 10 1 


Test the hypothesis of independence against the alternative of a positive correlation at 
a = 0.10. 


Use R, to test whether the data in Exercise 1 are random against the alternative of an 
upward trend at a = 0.10. 


Proschan (1963) gives the times of successive failures of the air-conditioning system of 
Boeing 720 jet airplanes. The times between failures on Plane 7908 are given below: 


413, 14, 58, 37, 100, 65, 9, 169, 447, 184, 36, 201, 118, 34, 31, 18, 18, 67, 
57, 62, 7, 22, 34 


if the failures occur according toa Poisson process, then the times between failures should 
be independent exponential variables. Otherwise, wearout or degradation may be 
occurring, and one might expect a downward trend in the times between failures. Test the 
hypothesis of randomness against the alternative of a downward trend at « = 0.10. 


The following values represent the times between accidents ina large factory: 
8.66, 11.28, 10.43, 10.89, 11.49, 11.44, 15.92, 12.50, 13.86, 13.32 

Test the hypothesis of randomness against an upward trend at « = 0.05. 

Use the runs test to. test randomness of the numbers in Exercise 1. 

Use the runs test to work Exercise 16. ; 


Suppose that a runs test is based on the number of runs of a’s rather than the total 
number. of runs, 


(a) Show that the probability of k runs of a’s is given by 
A-—1\/B+1 
\KS1 k 
Be (AE B 
Se 
(b) Rework Exercise 23 by using (a). 
For the Mann-Whitney statistic U, show that Var (U) is given by equation (14.7.6). 


15.1 


REGRESSION AND 
LINEAR MODELS 


INTRODUCTION 


Random variables that are observed in an experiment often are related to one or 
more other variables. For example, the yield of a chemical reaction will be 
affected by variables.such.as temperature or reaction time. We will consider a 
statistical method known as regression analysis that deals with such problems. 
The term regression was used by Francis Galton, a nineteenth-century scientist, 
to describe a phenomenon involving heights of fathers and sons. Specifically, the 
study considered paired data, (x,, y;),...,(X,5 Yq), Where x; and y, represent, 
respectively, the heights of the ith father and his son. One result of this work was 
the derivation of a linear relationship y = a+ bx for use in predicting a son’s 
height given the father’s. It was observed that if a father was taller than average, 
then the son tended also to be taller than average, but not by as much as the 
father. Similarly, sons of fathers who were shorter than average tended to be 
shorter than average, but not by as much as the father. This effect, which is 
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known as regression toward the mean, provides the origin of the term regression 
analysis, although the method is applicable to a wide variety of problems. 


LINEAR REGRESSION 


We will consider situations in which the result of an experiment is modeled as a 
random variable Y, whose distribution depends on another variable x or vector 
of variables x = (Xo, x1, ..., Xp). Typically, the distribution of Y, also will involve 
one or more unknown parameters. We will consider the situation in which the 
expectation is a linear function of the parameters 


E(Y,) = BoXo + Bix + +7 + Bx, (15.2.1) 


with unknown parameters, Bo, B,,..., B,. Usually, it also is assumed that the 
variance does not depend on x, Var(Y,) = o?. Other notations, which are some- 
times used for the expectation in equation (15.2.1) are yy), or E(Y|x), but these 
notations will not represent a conditional! expectation in the usual sense unless 
Xo, X1,..+, Xp are values of a set of random variables. Unless otherwise indicated, 
we will assume that the values x, x1, ..., x, are fixed or measured without error 
by the experimenter. 

A model whose expectation is a linear function of the parameters, such as 
(15.2.1) will be called a linear regression model. This does not require that the 
model be linear in the x,’s. For example, one might wish to consider such models 
as E(Y,) = BoxXo + 81x, +B2XoX, or E(Y,) = Boe* + B,e?*, which are both 
linear in the coefficients but not in the variables. Another important example is 
the polynomial regression model in which the x,’s are integer powers of a common 
variable x. In particular, for some p = 1, 2,..., 


E(¥) = Bo + Bix + B2x7 + +++ + Bx? (15.2.2) 


Some regression models involve functions that are not linear in the parameters, 
but we will not consider nonlinear regression models here. 
Another way to formulate a linear regression model is 


Y= Boxo t+ Bix to + By xy +e (15.2.3) 


in which ¢, is interpreted as a random error with E(e,) = 0 and Var(e,) = 07. 

It also is possible to have a constant term by taking the first component in x to 
be 1. That is, if x = (1, x,,..., x,), then E(Y,) = By + Bix, +++: + B,x,. In the 
next section we will study the important special case in which p= 1 and the 
model is linear in x, = x. 
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SIMPLE LINEAR REGRESSION 


Consider a model of the form 
Y. =Bo t+ Bix + & (15.3.1) 


with E(e,) = 0 and Var(e,) = o*. In this section we will develop the properties of 
such a model, called the simple linear model, under two different sets of assump- 
tions. First we will consider the problem of estimation of the coefficients By and 
£, under the assumption that errors are uncorrelated. 


LEAST-SQUARES APPROACH 


Suppose x,, ..., xX, are fixed real numbers and that experiments are performed at 
each of these values, yielding observed values of a set of n uncorrelated random 
variables of form (15.3.1). For convenience. we will denote the subscripts by i 
rather than x;. Thus, we will assume that fori = 1, ..., n, 


E(Y) = Bo + Bix Var(¥) = 0? Cov, Y=0 iFj 


The resulting data will be represented as pairs (xj; y1), ---,(X%;3 Vi). 
Suppose we write the observed value of each ¥, as y; = Bo + B,x, + e; so that 
e; is the difference between what is actually observed on the ith trial and the 
theoretical value E(Y,. The ideal situation would be for the pairs (x,, y,) to all fail 
on a straight line, with all the e, = 0, in which case a linear function could be 
determined algebraically. However, this.is not likely because the y,’s are observed 
values of a set of random variables. The next best: thing would be to fit a straight 
- Jine through the points (x;,:y,) in such away as to minimize, in some sense, the 
resulting observed deviations of the y, from. the fitted line. That is, we choose a 
line that minimizes some function of the e; = y; — By — 8, x;. Different criteria for 
goodness-of-fit lead to different functions of e;, but we will use a standard 
approach called the Principle of Least Squares, which says to minimize the sum of 
the squared deviations from the fitted line. That is, we wish to find the values of 
Bo and B,, say By and #,, that minimize the sum 


S= by (y; — Bo — Bix)” 


Taking derivatives of S with respect to By and £, and setting them equal to zero 
gives the least-squares (LS) estimates f and f, as solutions to the equations 


2 y [y; — Bo —$,xiM—1) =0 
i=1 


2 > Ly: — Bo —Byx(—x,) =0 


1 
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Simultaneous solution gives 


i; = py Xi Vi — ® xd yi/n 
: >» x? —(Y x)?/n 


LG: - H.-P 


oF (x; — x)? 
od >) (x; — x)yi 
~ Yi - %)? 


and 
Bo ayn Bx 
Thus, if one wishes to fit a straight line through a set of points, the equation 


y=Bo+ B,x provides a straight line that minimizes the sum of squares of the 
errors between observed values and the points on the line, say. 


n a 
SSE = > a= oh fyi - Bo — B,x,}? 
ist i=1 
The quantities 2, = y; — By — Bx; are known as residuals and SSE is called the 
error sum of squares. The least-squares principle does not provide a direct esti- 
mate of a”, but the magnitude of the variance is reflected in the quantity SSE. It 


can be shown (see Exercise 8) that an unbiased.estimate of a? is given by 


SSE 
n—-2 


ae 


The notation G? will be used throughout the chapter for an unbiased estimator of 
o*. The notation 6? will represent the MLE, which is derived later in the section. 
The following convenient form also can be derived (see Exercise 5): 


SSE =) y? — Bo Yy-Bs » Xi Vi 
Also 
§=Bo + Bix 
may be used to predict the value of Y,, and the same quantity would be used to 


estimate the expected value of Y,. That is, an estimate of E(Y,) = By + B,x is 
given by 


E(Y¥,) = Bo + Bix 


Note also that j = y + B,(x — x), which reflects the regression adjustment being 
made to the overall mean, y. 

Other linear combinations of 8, and £, could be estimated in a similar 
manner. The LS estimators are linear functions of the Y/s, and it can be shown 
that among all such linear unbiased estimators, the LS estimators have minimum 
variance. Thus, the LS estimators often are referred to as Best Linear Unbiased 
Estimators (BLUES). 
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Theorem 15.3.7 If E(Y) = By + By x;, Var(¥) =o? and Cov(¥,, ¥) =0 fori#j andi=1,..., n, 
then the LS estimators have the following properties: 


i. E(6,)=B,, Var) =——~—— 
(x; — x)? 
i=t 


n 
“ae ie 


2. E(Bo)= Bo, Var(é.)=-——"" => 
E y: oi = 3 | 
i=1 


3. E(cyBo + €2Bs) = C1Bo + CoB; 
4. c,Bo +f, is the BLUE of c,By +58, 


Proof 
Pari 1: 
EX 64 - OH) = L  — DEY) 
=D) (%; — Bo + Br x) 
om > (x: — X)Bo + By 2 (x; — x); 
=0-fot+Bi d (i — Xx; 
=6,¥ &;— 4; -* 4+ 9 
=p, >, (x; — x) 
it follows that 
By ye (x, — x)? - 


E(B ,) oo 3 (x; _ xP B, 
Also 


a Giz 4] 
Var(f v= Var] Re (06, 252 


. ~ 
~ Ee ep 2 Vale 94 


ee >» (x; aa X)?0? 
“Do 9P 


oa? 


x (x; — x)? 
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Part 2: 
E(B) ars E(¥ — B,%) 


1 
=7 2b E(¥) — Bix 


== 5 Bo + Bix) — Bi 
= Bo + BX — By x 
= Bo 


Now ¥ and f, are not uncorrelated, but Bo can be expressed as a linear com- 
bination of the uncorrelated Ys, and the variance of this linear combination can 
be derived. If we let b; = (x, — X)/}, (x; — X)?, then 8, = Yb; ¥, and 


where d; = . — xb; and 


Var(Bo) = 0? Yd? 
a? Sx? 


ny, (x; — x)? 


after some algebraic simplification. 
Part 3 follows from Parts 1—2 and the linearity of expected values. 


Part 4: 

Any linear function of the Y,’s can be expressed in the form c.f) + cf, 
+ ¥' a; Y; for some set of constants a,, ..., a,. For this to be an unbiased estima- 
tor of c,Bo + c2 A, requires that )' a(By + B,x,) = 0 for all By and f,, because 


E(c1 Bo + CoB, + Y a; ¥) = cy Bo + co 8, + D a(Bo + Bix) 


But >) a(Bo + B,x;) = 0 for all By and f, implies that >» 4;=0 and } a,x, =0. 
Now, 


c1Bo + co By = bY (c,d; + c2 b)Y, 
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and 


Cov(c,Bo So CPi; > a; ¥;) = > afcyd; + cz b,)o” 
=[ce, ¥ ad; +c, Ya, bjJo? 


= E by 76 _ in) + C2 > a, fo? 


ae E Ya + (cz — 1%) ¥ a; b, fo 


0 


The last step follows from the result that }’ a,=0 and }' a,x,;=0 imply 
% a,b; = 0, which is left as an exercise. Thus 


Var(ciBo + Cob, + ¥, a; ¥) = Var(cyBy + 281) + ¥ a? o? 


This variance is minimized by taking )’ a? = 0, which requires a; = 0, i= 1,. 
Thus, c,fo +¢2f; is the minimum variance linear-unbiased estimator of ce 
+ c,£,, and this concludes the proof. - 


Example 15.3.7 In an article about automobile emissions, hydrocarbon emissions (grams per 
mile) were given by McDonald and Studden (1990) for several values of accumu- 
lated mileage (in 1000s of miles). The following paired data was reported on 
mileage (x) versus hydrocarbons {y). 


x: 5,133, 10.124, 15.060, 19.946, 24.899, 29.792, 29.877, 35.011, 39.878, 44.862, 49.795 
y: 0.265, 0.287, 0.282, 0.286, 0.310, 0.333, 0.343, 0.335, 0.311, 0.345, 0.319 
To compute fy and f,, we note that n=11 and compute >. x; = 304.377, 


¥ x? = 10461.814, Y y,= 3.407, Y y? = 1.063, and ¥ x,y, = 97.506. Thus, 
X = 27.671, y = 0.310, 


rhe (97.506) — (304.377)(3.407)/11 
* (10461.814) — (304.377)?/11 


Bo = 0.310 — (0.00158)(27.671) = 0.266 


= 0.00158 


Thus, if it is desired to predict the amount of hydrocarbons after 30,000 miles, 
we compute f = 0.266 + 0.00158(30) = 0.313. Furthermore, SSE = 1.063 
— (0.266)(3.407) — (0.00158}(97.506) = 0.00268, and &? = 0.00268/9 = 0.000298. 
The estimated linear regression function and the plotted data are shown in 
Figure 15.1. 
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Hydrocarbon emissions as a function of accumulated mileage 


0.36 


io io to 
oO N & 


Hydrocarbons (gm/mi) 


Oo 
iS) 
oo 


0.26 
0.00 10.00 20.00 30.00 40.00 30.00 


_ Mileage ( x 1000) 


Consider now another problem that was encountered in Example 13.7.3. Recall 
that the chi-square goodness-of-fit test requires estimates of the unknown param- 
eters uw and a. At that time it was discovered that the usual grouped sample 
estimate of o was somewhat larger than the grouped sample. MLE and that this 
adversely affected the outcome of the test. It also was noted that the grouped 
sample MLEs are difficult to compute and a simpler method would be desirable. 
The following method makes. use of the least-squares.approach and provides 
estimates. that are simple to compute and that appear to.be comparable. 

Let the data be grouped into c cells and denote ‘the ith cell by A; = (a;—1, a;] 
for the i=1,...,c, and let 0, be the number of observations in A;. Let F; = 
(0, +++: +.0,)/n. Because ag is chosen to be fess than the smallest observation, 
the value F(a,) will be negligible, particularly if n is large. Thus, F; is an estimate 
of the CDF value F(a), which in the present example is ®((a, — y)/o). It follows 
that approximately 


1 
OF) = —F += a, 1s ane a | 


which suggests applying the simple linear regression method with x, = a; and 
y= O-'(F), By = —pu/a, and B, = 1/o. The last cell is not used because F, = 1. 
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For the radial velocity data of Example 13.7.3, the estimates are 8, = 1.606 
and f, = .0778, which expressed in terms of p and o give estimates 
fi = —20.64 and G = 12.85. These are remarkably close to the MLEs for 
grouped data, which are — 20.32 and 12.70. Of course, another possibility would 
be to apply the simple linear method with x; = ®71(F;)) and y, = a;. In this case 
the parameterization is simpler because 8) = yw and f, = a, although in this situ- 
ation the y is fixed and the x is variable. However, this approach also gives 
reasonable estimates in the present example. Specifically, with this modification, 
the estimates of u and o are — 20.63 and 12.73, respectively. 


This approach is appropriate for location-scale models in general. Specifically, 
if F(x) = of 2") then approximately, G~*(F;) = —n/@ + (1/@)a;, so the simple 


linear model can be used with By = —n/0, 8, = 1/6, y; = G-*(F)), and x, =a;, 
and the resulting estimates of and 4 would be 8 = 1/8, and 4 = —f,/f,. 


MAXIMUM LIKELIHOOD. APPROACH 


Now we will derive the MLEs of By, 8,, and o? under the assumption that the 
random errors are independent and normal, e, ~ N(0, 0). Assume that x,,..., x, 
are fixed real numbers and that experiments are performed at each of these 
values, yielding observed values of a set of n independent normal random vari- 

- ables of form (15.3.1). Thus, Y;,..., ¥, are independent, Y¥, ~ N(Bp + B,x;, 07). 
The resulting data will be represented as pairs (x,, y,), ..-5 (Xn5 Vz): 


Theorem 15.3.2 If Y,,..., Y, are independent with Y, ~ N(Bp + £1%x;, 07) fori =1,..., n, then the 
MLEs are 


A x (x; — x)y; 
¥ (y— 3 


eee * 
6? ee oa (yi — Bo — Bix)? 
i=t 
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Proof 


The likelihood function is 


aa | 1 
= 2 = — TF: Pa mer )? 
L = L(Bo, B1, 9”) U re exp | 553 (Vi — Bo px? | (15.3.2) 


Thus, the log-likelihood is 
n 2 | er 4 
In L= 5 In 2x0?) 55 ¥ (1 ~ Bo ~ Bax) 


If we set the partials to zero with respect to By, B,, and o”, then we have the ML 
equations 


nBo + (5 mB => (15.3.3) 
i=1 i=1 


(3 nJB as (3 xf Vi = py xi Yi (18.3.4) 


=1 


ng? = > oO: — Bo — B,x,)? (15.3.5) 


The MLEs of 8, and £, are. obtained by solving equations (15.3.3) and (15.3.4), 
which are linear equations in f, and fy. 


Notice that the MLEs of fy and f, are identical in form to the BLUEs, which 
were derived under a much less restrictive set of assumptions. However, it is 
possible to establish some useful properties under the present assumptions. 


Theorem 15.3.3 If Y,,..., Y, are independent with ¥, ~ N(Bp + £,x;, 07) and 


n 


S=>) ¥,S2= ) x:¥, and Sa= d Y?, then 


i=1 t=1 
1. The statistics $,, S,, and S, are jointly complete and sufficient for By, B,, 
and o”. 


2. If o? is fixed, then S, and S, are jointly complete and sufficient for B, 
and f,. 
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Proof 
Part 1: 
The joint pdf of ¥;,..., ¥, given by equation (15.3.2), can be written as 


= 1 an an 
Sarees Yn) = (2007) -"? exp| — 5 Vy? +3 DY 
20 i=l OO” j=1 


B, n 1 n : 
+ =) Yo xy 62 Py (Bo + Bx) 


i=1 


= C(O)h(y1, -.., Yn) EXP EG py yi 


+ q,(0) iu xi ¥; + 43(8) | 
with 
8 = (Bo, Bi, 07), C(®) = (2207)? exp E oy (Bo + p.xi'/00)| 


AV 1, «+s Ya) = 1, 91(8) = Bo/o?, 42(0) = 81/07, and q,(8) = —1/(20°). This is the 
multivariate REC form of Chapter 10. 


Part 2 follows by rewriting the pdf as 
L(t + Va) = COPAY, «5 Yn) EXP Ec x vit 42(8) x9 
i=l i=l 


with the notation now defined as 8 = (8y, B:), 4,(8) = Bo/a?, and q,(®) = B,/c* 
and 


HY1, +++ Yn) = €XP |- by y#100 


Notice that the MLEs f,, §,, and 6? are jointly complete and sufficient for Bo, 
By, and o? because they can be expressed as functions of }) ¥%, 5, x, ¥, and) Y?. 
Similarly, if o? is fixed then By and f, are functions of )' Y, and }° x; Y; and thus 
they are jointly complete and sufficient for By and f,. B 


We note at this point that some aspects of the analysis are simplified if we 
consider a related problem in which the x/s are centered about x. Specifically, if 
we let x¥ = x,;—X and B, = By + B,x, then the variables of Theorem 15.3.2 can 
be represented as Y¥; = B, + Bi(x; — x) + & = B, + Bix} + & where ) x# = 0. In 
this representation, 8, has the form 


B= y 
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and 
1 n 
a? = by [yi -— 9 — Biv = X)}? 


It also is easily verified that 8,, 6, and 6? are jointly complete and sufficient for 
B., By, and o?, and if o? is fixed, then f, and f, are jointly complete and sufficient 
for B, and py. 

An interesting property of the centered version that we will verify shortly is 
that the MLEs of £,, £;, and o? are independent. This property will be useful in 
proving the following distributional properties of the MLEs of Bo, By, and a”. 


If ¥, = By + B,x; + & with independent errors &; ~ N(0, o?), then the MLEs of By 
and f, have a bivariate normal distribution with E(fo) = Bo, E(f,) = B;, and 


Var(Bo) = a r ha 


oc 


Var(B,) = R=? 


CoviBo, B,) ewe ee 


Furthermore, (By, 61) is independent of 62, and no?/a? ~ x(n — 2). 


Proof 
The proof is somewhat simpler for the centered problem, because in this case the 
estimators of the coefficients are uncorrelated and all three MLEs will be inde- 
pendent. Our approach will be to prove the result for the centered case and then 
extend it to the general case. 

We define a set of n+ 2 statistics, W,= Y,— 8. —B,(x;-%) if i=1,...,n, 
U, =8,, and U, = f,. Notice that each variable is a linear combination in the 
independent normal random variables Yj, ..., ¥,. Specifically, 


n 


U,= ¥ (1/n)y, Ur= 2 bY, W=d oy, 
=1 


j= 
with coefficients b,; = (x; — a DG cu A dln -— 04 — X)b;, and 
i=1 


Cy = —1/n — (x; — X)b, if j # i. It is easily verified that }) b,=0 and Yc, = 0 
jel jal 

for each i = 1, ..., n. Using these identities first we will derive the joint MGF of 

the U;’s and then derive the joint MGF of the Ws. 
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Let U=(U,, U,) and t = (t;, t,), and consider 
M,(t) = Elexp (t,U, + t2 U2)] 


= Hl exp (« . (1/n)¥; + t2 F b; x) 
j=1 j=l 


= Bl exp ( Yay x) (15.3.6) 
jal 


with a, = (t,/n + t, b,). It can be verified that }) a;=t,, )) (x;— X)aj = to, 
j=l 


jal 
and = aj = ti/n + 3 | ¥ (x; — x)? (see Exercise 12). 
cai each Y; is aoc its MGF, evaluated at a;, has the form 
My (a) = exp [(B. + Bi 0c; — X))aj + 30745] 
Thus, from equation (15.3.6) we have that 


Mil) = i My{a,) 


n 


= T] exp [(6, + B(x; — x)a; + 3 40? aj 7] 


j=l 


=ex| 5 (B. + Bs(x; — X)a; + $07 y a | 
j= i=t 
= exp E > a;+ B, s (x; — X)a; + 40° 5 “| 


jH1 jai j=1 


=|. ty + Bit, + 407t?/n + 4e es (x; ~ X) | 


= exp [B, t, + 4(a7/n)t7] exp iz +to “al » &- | 


The first factor, a function of t, only, is the MGF of N(Z,, oi, and the second 
factor, a function of t, only, is the MGF of N(6,, 07/9; (x; — x)’). Thus, f, and f, 
are independent with f. ~ N(f,, o */n) and 


By ~N(Bi, ¢ ay (x; — x)’) 


We know that for fixed o?, 8, and f, are complete and sufficient for 8, and fy. 
We also know from Theorem 10.4.7 that any other statistic whose distribution 
does not depend on f, and B , must be independent of f, and f,. Pucthermore, if 
the other statistic is free of o?, then it is independent of f, and 6, even when o? is 
not fixed. 
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The next part consists of showing that the joint distribution of the W7?s does 
not depend on f, or f,. If this can be established, then it follows from Theorem 
10.4.7 that the W;’s are independent of the U/s. 

Let W=(W,,..., W andt = (f,, ..., t,), and consider 


7) 


t 


My(t) = Al exp ( 


= a] ex & d, 1) (15.3.7) 
j=1 


with d; = }° t;c;;. The following identities are useful in deriving the MGF (see 
i=i 
Exercise 12): 


Y (%j— Xd; =0 
j=1 


n n n 2 
YG=) (s cyt) 
j=1 j=1 \i=1 


The rest of the derivation is similar to that of the MGF of U, and U, except that 
a, is replaced with d, for all j = 1, ..., n. We note from (15.3.7) that 


My(t) = ie My(d)) 
= exp E » 4+8, ¥ (%j;— Xd) +40? ¥ ‘| 
j=l j=l j= 


n n 2 
= exp Ea ry & cut) | 
j=i1 N\i=1 


This last function depends on neither £, nor #,;, which as noted earlier means 
that W = (W,, ..., W,) is independent of f, and f,. 
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The rest of the proof relies on the following identity (see Exercise 13): 


Y CB. ~ fats ~ 8)? = nd? + nf, Bo + 1 — DB, — By 


i=1 
(15.3.8) 


We defined random variables Z;=[Y,—f, —B,(x;—X)V/o, for i= 1... 1, 
Zn+i = JB, — BYo and Z,..=./ ¥. (x%;—%)(8, — B,)/o, which are all stan- 
i=1 


dard normal. From equation (15.3.8) note that 


n 
los 
73 Z? = 2 + Zea + Zp. (15.3.9) 


If we define Vj = }) Z?, V, =né?/o?, and V;=Z?,,+Z?,,, then V, ~ y(n) 
i=1 : 

and V; ~ y’(2). Furthermore, because V, = né?/o? is a function of W,,..., W,, 

we know that V, and V, are independent. From equation (15.3.9), it follows that 

VY, = V, + V3, and the MGF of V, factors as follows: 


My, © = My, (OMy,() 
(1 — 24)-"? = My, (1 — 207} 
Thus, M,,(t) = (1 — 2t)"""”?, from which it follows that V, ~ y(n — 2). This 
proves the theorem for the centered problem. The extension to the general case 


can be accomplished by using the fact that f is a linear function of B, and B,, 
Bo = 8. — Bx. Using this relationship, it is straightforward to derive the joint 


- MGF of fy and f, based on the MGF of U= (@., f,) (see Exercise 14). This 


Theorem 15.3.5 


concludes the proof. 


According to Theorem 15.3.4, E(Bo) = Bo, E(B;) = B,, and E(né?/o?) = n — 2, 
from which it follows that B), 8,, and 6? = ¥ 1% — Bo — Bix)?n — 2) are 
unbiased estimators. We also know from Theorem 15.3.2 that these estimators 
are complete and sufficient, yielding the following theorem. 


If Y,,..., ¥, are independent, ¥, ~ N(Bq + £,x;, 07), then Bo, B,, and &? are 
UMVUEs of £y, B,, and o?. i) 


It also is possible to derive confidence intervals for the parameters based on the 
above results. 
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Theorem 15.3.6 If Y,,..., Y, are independent, ¥, ~N(Bo + £1x;, 67), then » x 100% confidence 
intervals for Bo, By, and o? are given respectively by 


2 2 
1. (4. — ls eyy2F Pa Bo + ta +y2F dave) 


t od t, a 
2.( a (i+y/2  fUtyj29 Fans 


JY > ahs 75.13) 
3 (e —2)6? (n—2)6 *) 


2 2 
Xaty2  Xa-ye2 
where the respective t and y? percentiles have n — 2 degrees of freedom. 


Proof 
It follows from Theorem 15.3.4 that 


Bo — Bo) N(O, 1) 


“oid itn 3) 
7 = Vx b= 96 — Bs) ~ N(O, 1) 


oC 


(n — 26? 
V ree ee Me <2) 


Furthermore, each Z) and Z, is independent of V. Thus, 


Zo (Bo — Bo) ~ t(n—2) 


0 Teiet a J any @, ~Dl 


as Zy we » (; — XB, — B,) 
Ti = aa = Le ~ t(n — 2) 


The confidence intervals are derived from the pivotal quantities T,, T,,andV. @ 


and 


It also is possible to derive tests of hypotheses based on these pivotal quantities. 


Theorem 15.3.7 Assume that Y,,... Y, are independent, Y, ~ N(Bp + £1x;, 07), and denote by to, 
t,, and v computed values of To, T,, and V with By, B,, and a? replaced by Boo, 


Bio, and a2, respectively. 
1. A size a test of Hyp: By = Boo versus H,: Bo ¥ Boo is to reject Ho if 


[to| 2 ty ~4/2(n — 2). 


Example 15.3.3 


a 
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2. A size a test of Hy: B, = Bio versus H,: 8, # Bio is to reject Hy if 
|t, | 2 t,~a)2(n — 2) 
3. A size « test of Hy: 0? = 04 versus H,: 07 # a? is to reject Hp if 


0 >Xj-2n — 2) or v< xan — 2) Ea 


One-sided tests can be obtained in a similar manner, but we will not state them 
here (see Exercise 16). 


Consider the auto emission data of Exercise 15.3.1. Recall that ¥. x, = 304.377 
and )’ x? = 10461.814 from which we obtain ¥ (x, — x)? = 10461.814 
— 11(27.671)? = 2039.287. We also have that Bo = 0.266, B, = 0.00158, and 
&? = 0,00030 so that = 0.017. 

Note that t575(9) = 2.262, y%25(9) = 2.70, and 7%75(9) = 19.02. If we apply 
Theorem 15.3.6, then 95% confidence limits for By are given by 0.266 
+ (2.262)(0.017),/(10461.814)/[(11)(2039.287)] or 0.266 + 0,038, and a 95% con- 
fidence interval is (0.228, 0.304). Similarly, 95% confidence limits for B, are given 
by 0.00158 + (2.262)(0.017)/./2039.287. or 0.00158 + 0.00085, and a 95% con- 
fidence interval is (0.0007, 0.0024). The 95% confidence limits for o? are 
9(0.00030)/19.02 = 0.00014 and 9(0.00030)/2.70 = 0.00100, and thus 95% con- 
fidence interval for a? and a are (0.00014, 0.00100) and (0.012, 0.032), respectively. 


GENERAL LINEAR MODEL 


Many of the results derived for the simple linear model can be extended to the 
general linear case. It is not possible to develop the general model conveniently 
without introducing matrix notation. A few basic results will be stated in matrix 
notation for the purpose of illustration, but the topic will not be developed fully 
here. We will denote the transpose of an arbitrary matrix A by A’. That is, if 
A = {a,;}, then A’ = {a,,}. Furthermore, if A is a square nonsingular matrix, then 
we denote its inverse by A~*. We also will make no distinction between a 1 x k 
matrix and a k-dimensional row vector. Similarly, a k x 1 matrix will be regarded 
the sa~ . as a k-dimensional column vector. Thus, if ¢ represents a k-dimensional 
column vector, then its transpose ¢’ will represent the corresponding row vector. 
Consider the linear regression model (15.2.3) and assume that a response y, is 
observed at the values xj9, Xj, +-:, Xip, i= 1,...,n with n>p+1. That is, 
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assume that 
p 
E(Y)= > Byx; Var(¥Y=o? Cowy, ¥)=0 i4j 
j=0 


We will denote by Va matrix such that the ijth element is the covariance of the 
variables Y, and Y;, V = {Cov(Y;, ¥)}. The matrix V is called the covariance 


matrix of Y,,..., Y,. We will define the expected value of a vector of random 
variables to be the corresponding vector of expected values. For example, if 
W = (W,,..., W,) is a row vector whose components are random variables, then 


E(W) = (E(W,), ... E(W,)), and similarly for column vectors of random variables. 
It is possible to reformulate the model in terms of matrices as follows: 


E(Y) = XB V=o7l (15.4.1) 
where Jis the n x_n identity matrix, and-Y, 8, and X are 


Y, Bo X10 °° 1p 
y=(:).B=[:] X= : 
V« Bp Xno 70 Xnp 


LEAST-SQUARES APPROACH 


The least-squares estimates are the values B; = B , that minimize the quantity 


a y » = > Bx | = (Y — Xpy(¥ — XB) (15.4.2) 


The approach used with the simple linear model generalizes readily. In other 
words, if we set the partials of S with respect to the B,’s to zero and solve the 
resulting system of equations, then we obtain the LS estimates. Specifically, we 
solve 


rs) " P 
og Se Dy a 91 SE Bix, (x) =0 k =0, 15-5655.) 
Bi, i j=o 
This system of equations is linear in the B/s, and it is conveniently expressed in 
terms of the matrix equation 
X'Y = X’'XB ; (15.4.3) 
If the matrix X’X is nonsingular, then there exists a unique solution of the form 
p= (X'X) XY (15.4.4) 


Unless indicated otherwise, we. will assume, that X’X is nonsingular. Of course, a 
more basic assumption would be that X has full rank. 


Theorem 75.4.7 
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The estimators B; are linear functions of Y,,... Y,,; and it can be shown that 
they are unbiased. Specifically, it follows from matrix (15.4.1) and equation 
(15.4.4) that 


E(B) = (X'X)*X"E(Y) 
= (X'X)"*X'XB 
=f (15.4.8) 


As in the case of the simple linear model, the LS estimates of the £;’s for the 
general model are referred to as the BLUEs. 

It also can be shown that the respective variances and covariances of the 
BLUEs are the elements of the matrix 


C = {Cov(B;, B)} = o7(X’X)"} (15.4.6) 
p 
and that the BLUE of any linear combination of the B's, say B= ¥ r;B; is 
j=0 
given by r’B (see Scheffe, 1959). 


Gauss-Markov If E(¥)= XB and V = {Cov(¥;,, ¥)}=o7f, then r'B is the 
BLUE of r’B, where 8 = (X’X) 1X’. | 


This theorem can be generalized to the case in which 


V = {Cow(¥,, YJ} = 074 


where A is a known matrix. That is, the Ys may be correlated and have unequal 
variances as long as A is known. It turns owt that the BLUEs in this case are 


' obtained by minimizing the weighted sum of squares (Y — XB) A~‘(Y — Xp). 


Note that o? also may be a function of the unknown f 78, say a7 = c(B). 


Theorem 15.4.2 Generalized Gauss-Markov Let E(Y) = XB and V = {Cov(¥,, Y)} = c(B)A, 


where A is a matrix of known constants. The generalized least-squares estimates 
of B are the values that minimize S = (Y — Xp)'A ~41(¥ — Xp), and they are given 
by B = (X’A~1X)"1X’A~'Y. Also, r’B is the BLUE of r’B. a 


Note that all of these results have been developed in terms of the means, 
variances, and covariances and that no other distributional assumptions were 


made. 


LINEAR FUNCTIONS OF ORDER STATISTICS 


One interesting application of Theorem 15.4.2, which arises in a slightly different 
context, is that of finding estimators of location and scale parameters based on 
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linear functions of the order statistics. Consider a pdf of the form 


: nae w—Bo 
I (W; Bo, Bs) B, rt 7h. ) 


where g is a function independent of fy and f, and let 


W Bo 


Z= 


~ 9(2) 
For an ordered random sample of size n, let 


Y, Win 
Y=/(:)4| : B= (F) 
Ye) \Wan : 


It follows that 


Win av, B 
EO) = BM) =o + B.2| ME 
1 : 
= Bo oF B,E(Z,,,) 
= Bo t+ Bik; 
That is, in this case let 
i ky 
X= : 
1 k,, 
Also, 
vy = Cov(W,..,» Wien) = BY Cov(Z;.,; Z jen) = Bia; 
and 


V = Bid 


It then follows that the components of 


6 = (*) =(X'A-1X) UX AY 
1 


are unbiased estimators of By and f, that have minimum variance among all 
linear unbiased functions of the order statistics. The main drawback of this 
method is that the constants k; and a,, often are not convenient to compute. In 
some cases, asymptotic approximations of the constants have been useful. 

It is interesting to note that if A is not used, then the ordinary LS estimates 
Bf = (X’X)~4X’Y still are unbiased, although they will not:-be the BLUEs in this 


Case. 


Theorem 15.4.3 
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MAXIMUM LIKELIHOOD APPROACH 


In this section we will develop the MLEs and related properties of the general 
linear model under the assumption that the random errors from different experi- 
ments are independent and normal, ¢,, ~ N(0, 0”). 

Suppose Y,,...,-Y, are independent. normally. distributed random variables, 
and they are the elements of a vector Y that satisfies model (15.4.1). 


Pp 
If ¥,,..., ¥, are independent with Y, ~ n Y By xy; o*) for i=1,...,n, then 
Aeon 
the MLEs of By, ..., B, and o? are given by 


B = (XX) 1X'Y (15.4.7) 
3? = FAB" *8) (15.4.8) 


Proof 
The likelihood function is 


nt 1 Pp 2 
L=L@o,..., Bp, 0”) = [] ep | —553(9,- dX B,x,) | 


i=i./2no 
(15.4.9) 
Thus, the log-likelihood is - 
feo eee (eS pay 

eae) o 20? ©, yi mS ix 

= —" In (2n0?) — +5 (Y — XBy(¥ — XB) 

ee) = 2a? 

_ _ft eee 

= —3|n 2x07) 55 8 (15.4.10) 


where S is the quantity (15.4.2). Clearly the values of Bo,..., 8, that maximize 
function (15.4.9) are the same ones that minimize S. Of course, this means that 
the MLEs of the Bs under the present assumptions are the same as the BLUEs 
that were derived earlier. The MLE of o? is obtained by setting the partial deriv- 
ative (15.4.10) with respect to o” to zero and replacing the f,’s with the fs. 
Another convenient form for the MLE of a? (see Exercise 25) is 


ee ES XB) 
n 


(15.4.11) 
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As in the case of the simple linear model, the minimum value of S reflects the 
amount of variation of the data about the estimated regression function, and this 
defines the error sum of squares for the general linear regression model, denoted 
by SSE = (Y — Xpy(Y — Xp). 

Many of the results that were derived for the MLEs in the case of the simple 
linear model have counterparts for the general linear model. However, to state 
some of these results, it is necessary to introduce a multivariate generalization of 
the normal distribution. 


MULTIVARIATE NORMAL DISTRIBUTION 


In Chapter 5, a bivariate generalization of the norma! distribution was presented. 
Specifically, a pair of continuous random variables X, and X, are said to be 
bivariate normal, BVN(,, “2, 07, 03, p), if they have a joint pdf of the form 


(15.4.12) 


f(xy, 2) =——— = x ( 1) 
as 2no,0,,/1 — p? . 2 


I X,— Hi \ x1 Hi \[ X27 4a X2 — M2)? 
onal at) ast SH) (a 
(1 — p’) Oy O71 o2 er) 
This pdf can be expressed more conveniently using matrix notation. In particular, 
we define the following vectors and matrices: 


(2) 


It can be shown that Q = (x — )'V~*(x — ) and that the determinant of V is 
| V| = 0703(1 — p?). Notice that we are assuming that V is nonsingular. We will 
restrict Our attention to this case throughout this discussion. This provides a way 
to generalize the normal distribution to a k-dimensional version. 


with 


2 


»=(“4) V = {Cov(X;, X)} 


Definition 15.4.7 


A set of continuous random variables X;,..., X, are said to have a multivariate 
normal or k-variate normal distribution if the joint pdf has the form 


he wre 9] (15.4.13) 


1 
S(%p oot) eee ox | 


with x’ = (x,,...,%,), BY = (fy, .--> My), and V = {Cov(X;, X,)}, and where py; = 
F(X,) and Vis ak x k nonsingular covariance matfix. 
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Notice that for a set of multivariate normal random variables X,, ..., X;,, the 
distribution is determined completely by the mean vector jf and covariance 
matrix V. An important property of multivariate normal random variables is that 
their marginals are normal, X, ~ N(u;, 07) (see Exercise 27). 

Another quantity that was encountered in this development is the variable 
Q =(x —p)'V~*(x —p). This is a special case of a function called a quadratic 
form. More generally, a quadratic form in the k variables x,,..., x, is a function 
of the form 7 


k k 
Q=) DY ayx;x; (15.4.14) 


i=1 j=0 


where the a,;’s are constants. It often is more convenient to express Q in matrix 
notation. Specifically, if A = {a,} and x’ =(x,,..., x,), then Q = x’Ax. Strictly 
speaking, Q = (x — p)'V~ ‘(x — p) is a quadratic form in the differences x, — yy, 

., X, ~ HW, rather than x,,..., X,..An example of a quadratic form in the x,’s 
would be Q = x'Ix =} x}. Some quadratic forms in the y/s that have been 
encountered in this section are a? and 4’. 


PROPERTIES OF THE ESTIMATORS 


Most of the properties of the MLEs for the simple linear model can be extended 
using the approach of Section 15.3, but the details are more complicated for the 
higher-dimensional problem. We will state some of the properties of the MLEs in 
the following theorems. 


Theorem 15.4.4 Under the assumptions of Theorem 15.4.3 the following properties hold: 


i. The MLEs Bo, ..., B, and 6? are jointly complete and sufficient. 

2. § has a multivariate normal distribution with mean vector B and 
covariance matrix 7(X’X)7}. 

_ no?/a? ~ x(n — p — 1). 

. B and G? are independent. 

. Each f, is the UMVUE of f;. 

. 6? =(¥ — Xf) (Y — Xf)(n — p — 1) is the UMVUE of o?. S| 


Nun Pw 


It also is possible to derive confidence intervals for the parameters. In the 
following theorem, let A = {a;;} = (X’X)~‘ so that C = {Cov(f,, B)} = 07. 
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Theorem 15.4.§ Under the assumptions of Theorem 15.4.3, the following intervals are y x 100% 
confidence intervals for the Bs and 07: 


1. By tat yp J 0745. By + tasopJS 0 ay 
2 (Meee te ee te) 


2 ? 2 
Xaty2 Xa-w2 
where the respective t and x? percentiles have n — p — 1 degrees of freedom. 


Proof 
It follows from Theorem 15.4.4 that 8, ~ N(B;, 07a),), so that 


z -224). xo, 1 
O° aj; . 
ee oe 


Furthermore, because Z and V are independent, 


Zz _ B= 8) | er vee | 
f )02 67 ay ela 


The confidence intervals follow from these results. 


T= 


It also is possible to derive tests of hypotheses based on these pivotal quan- 
tities. 


Theorem 15.4.6 Assume the conditions Theorem 15.4.3 and denote by t and v computed values of 
T and V with B, = Bj. and 0? =o. 


1. A size a test of Hy: B, = By versus H,: 8; # Bio is to reject Hy if 
|t| 2 ty_aa(n — p — 1) 

2. A size a test of Ho: B, =Bjo versus H,: 8; > Bio is to reject Ho if 
EPP). 

3. A size a test of Hy :0? = 02 versus H,: 07 #6 is to reject Hy if . 
v> xi -aa(n— p—1) or 9 <XZp(n — p—- 1) 


4. A size « test of Hy: 0? =o versus H,: 07 > a is to reject Ho if 


v2 xi-(n—- p— 1) 


Example 15.4.1 
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Lower one-sided tests can be obtained in a similar manner, but we will not 
state them here. 


Consider the auto emission data of Example 15.3.1. Recall from that example 
that we carried out the analysis under the assumption of a simple linear model. 
Although there is no theoretical reason for assuming that the relationship 
between mileage and hydrocarbon emissions is nonlinear in the variables, the 
plotted data in Figure 15.1 suggests such a possibility, particularly after approx- 
imately 40,000 miles. An obvious extension of the original analysis would be to 
consider a second-degree polynomial model. That is, each measurement y, is an 
observation on a random variable Y, = By + Bix; + 8.x? +.¢;, with independent 
normal errors, ¢; ~ N(0, a). This can be formulated in terms of the general linear 
model (15.4.1) using the matrices 


Bo 1 x, x4 
B={fi| X=: : 
B; 1) expen? 


with n = 11. Recall from equation (15.4.4) that if the matrix X’X is nonsingular, 
then the.LS.estimates, which also.are the. ML estimates, are the components of 
the vector Bf = (X’X)~1X’Y. Although the recommended procedure is to use a 
statistical software package such as Minitab or SAS, it is possible to give explicit 
formulas for the estimates. In particular, 
_ StyS22 7 Say S12 A __ S2ySi1 ~ Sty S12 ey oy | 
By ~ $1 1822 — Sf2 nee 511822 — Sto ce a pan 


with 


= dx ee _ yss a 2y _ wx\25 
x" = ay. Syy = XY AXY, Say =, XPV; n(x)", 


$14: Dp XF IR) Sto D, oP me ¥x2, cand....$99.= ) xf — n(x’). 


The LS estimates of the regression coefficients are fy = 0.2347, 6, = 0.0046, and 
f, = —0,000055, yielding the regression function } = 0.2347 + 0.0046x 
— 0.000055x?. A graph of ) is provided in Figure 15.2 along with a plot of the 
data. For comparison, the graph of the linear regression function obtained in 
Example 15.3.1 also is shown as a dashed line. 

An obvious question would be whether the second-degree term is necessary. 
One approach to answering this would be to test whether f, is significantly dif- 
ferent from zero. According to Theorem 15.4.6, a size «= .05 test of Hy: 8, =0 
versus H,:8, #0 is to reject Ho if |t| > 2.306. Because in this example 
t= —2.235, B, does not differ significantly from zero, at least at the .05 level. It 
should be noted, however, that the test would reject at the .10 level. 
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Hydrocarbon emissions as a function of accumulated mileage (second-degree 
polynomial fit) 
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Another question of interest involves joint tests and confidence regions for two 
or more of the coefficients. 


JOINT TESTS AND CONFIDENCE CONTOURS 


Part 1 of Theorem 15.4.6 provides a means for testing individual regression coef- 
ficients with the other parameters of the model regarded as unknown nuisance 
parameters. It also is possible to develop methods for simultaneously testing two 
or more of the regression coefficients. 


Although the plots of the hydrocarbon data in Figures 15.1 and 15.2 strongly 
suggest that some sort of regression model is appropriate, it would be desirable in 
general to test whether terms beyond the constant term fp really are needed. 
Such a test can be constructed using the approach of the generalized likelihood 
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ratio (GLR) test of Chapter 12. In the auto emissions example, suppose it is 
desired to test jointly the hypothesis Hy: 8, = 8, = 0 versus the alternative that 
at least one of these coefficients is nonzero. The parameters By and o? would be 
unknown. Let 2 be the set of all quadruples (6p, 8,, 8,, 0?) with —c <B, < © 
and o? > 0. Furthermore, let 25 be the subset (Bo, 0, 0, o?). Under the assump- 
tion of independent errors, ¢; ~ N(0, o?), and over the subspace Q) the MLEs of 
Bo and o? are Bo = 9 and 62 = ¥ (y; — j)/n, while over the unrestricted space 
Q. the joint MLEs are the components of PB =(X'X)"!X'Y and 6? 
= ¥ (y; — §)?/n with j, = By + By x, + B, x?. Consequently, the GLR is 


4) = £5 Boo, 0, 0, 6 _ L@n)eo]"" exp [= (1/266) ¥ (i - WI 
$Y; Bos Bs, Ba 67) ((2m)67]-"? exp [— (1/267) Y (y; = $7] 


6 —n/2 
\e? 


Thus, the GLR test would reject Hy : 8B, = B, = 0 if A(y) < k, where k is chosen 
to provide a size a test. A simple way to proceed would be to use the approx- 
imate test given by equation (12.8.3). The test is based on the statistic 
—2 In A(y) = n In (63/67) = 16.67. Because r = 2 parameters are being tested, the 
approximate critical value is y’55(2) = 5.99, and the test rejects Ho. In fact, the 
test is highly significant because the p-value is .0002. We will see shortly that an 
exact test can be constructed, but first we will consider joint confidence regions. 

Consider the quantity S = S(p) = (Y — XB)(¥ — Xp), which is the sum of 
squares that we minimized to obtain the LS and ML estimates of the regression 
coefficients. In other words, the minimum value of S is S(B). Specifically, it can be 
shown (see Exercise 29) that 


S(B) = SB) + (B — BY(X’X\(6 — B) (15.4.15) 


With the assumptions of Theorem 15.4.4 we have that S(B)/o? ~ y?(n) and 
S(B)/o? = né?/o? ~ y?(n — p — 1). Furthermore, from Part 4 of Theorem 15.4.4 
we know that f and 6? are independent; with equation (15.4.15), this implies 
that S(p) — S(p) = (6 — B)(X’X)® — B) and S(p) = né? are independent. Based 
on the rationale of Theorem 15.3.4, it follows that [S(B) — S(B)]/o? 
~ 77(n — (n — p — 1)) = x?(p + 1). Thus, we can derive an F-variable 


[S(B) — SB)I/(p + 1) | Sih 
s(@/n —p —1) F(p+1,n—p-—1) (15.4.16) 


It follows that a y x 100% confidence region for Bo, fy, ..., By is defined by. the 
inequality 


S(B) < si 1 + I S(pt+i,n—p- »| (15.4.17) 
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The boundary of such.a region is known. as a confidence contour. Of course, 
another way to express such a confidence region is in terms of the quantity 
O = (f — B)(X’X)(6 — B), which is a quadratic form in the differences By - — B;.1n 
particular, we have 


(6 — B)(X’X)(8 — B) <(p + 167 Fp + 1,n— p—1) (15.4.18) 


For. the important special case of the simple linear model, the corresponding 
confidence contour is an ordinary two-dimensional ellipse with center at (fo, f,). 


Consider the. auto emission data of Example 15.3.1. To obtain a confidence 
contour by quantity (15.4.18), it is necessary to find X’X. For the simple linear 


1 1 
) and consequently 
Mas coed bee 


clos So) ile) 


95% confidence contour for (By, B;) 


model we have X’ = ( 


0.0030 


0.0025 
0.0020 (Bo. Bx) = (.266, 00158) 


0.0015 


Parameter ~, 


0.0010 


0.0005 


0.0000 
0.22 0.24 0,26 0.28 0.30 0.32 


Parameter Bo 
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From this we obtain Q = n(B, — Bo)? + 2nx(Bo — BoXB1 — B1) + nx*(B, — Ay. 
verifying the earlier comment that the confidence contour is an ellipse with center 
at (Bo, B,). For this example, n = 11, ¥ = 27.67, and x2 = 951.07. Using the per- 
centile f'y<(2, 9), the graph of a 95% confidence contour for this data is shown in 
Figure 15.3. Of course, the same confidence region would be obtained using 
inequality (15.4.17). 


The above results also suggest how to construct a joint test of hypotheses 
about all of the £;'s. For example, a size « test of the hypothesis Hy : By = By 
=-::= 8, =0 versus the alternative that at least one of the B,’s is nonzero 
would be to reject if 


£S(0) — S@)I/(p + 1) 
Sn — p — 1) 


where @ is the (p + 1)-dimensional column vector with all zero components. This 
can be generalized to provide a test that the 8 7s in some subset are all zero. 
Consider a null hypothesis of the form H o'B, =*':=8,=0 where 
O0<m«p. This is a generalization the test of Example 15.4.2, and in fact the 
GLR approach extends immediately to the case of the general linear model with 
p + | coefficients. The parameters Bo, ..., 8,,-, and a? would be unknown. Let 
Q be the set of all (p + 2)-tuples (Bo, B;, v++y Bp, 0°) with —co <B,< 0 and 
a? >0, and let Q, be the subset of Q such that B;=0 for j=m,..., p. We 
assume independent errors, ¢, ~ N(0, o?). The MLEs over the subspace Qo are 
the components of 6, = (X 0 Xo) Xo’ ¥, where Xp is the n x m matrix consisting 


>fi-dpt+i,n—p-—1) (1.4.19) 


of the first _m columns of X, and 63 = S(B,)/n= Y (y;—3o)?/n_ with 
Pal 


mai 


Jo= Y Boj xs On the other hand, over the unrestricted space Q the joint 
j=0 
MLEs are the usual 6 = (X'X)"'X'Y and 6? = S@)/n= ¥. (y; —5,?/n with 
i=1 


P 
Kya Y Bysy- The GLR derivation of Example 15.4.2 extends easily to yield a 
j=0 


GLR of the form A(y) = (63/67)~"?. Thus, the GLR test would reject Hosp. 
=-++= £, =0 if A(y) <k where k is chosen to provide a size « test. As in the 
example, we could employ the chi-square approximation of —2 In A(y), but an 
exact test is possible. Note that 


Liane? Alone —3p 45 ee pe 5 
p—m+1i'0) ; Teresi eal 
_ ESBo) = SB)I(p — m + 1) 


S(B)/(n — p — 1) (15.4.20) 
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which is a decreasing function of A(y). Furthermore, under the null hypothesis 
Hy: 8, ='': =B, =9, it can be shown that the ratio on the right of equation 
(15.4.20) is distributed as F(p -m+ 1, n—p-— 1). Consequently, a size a test, 
which is equivalent to the GLR test, would reject Ho if 


[S(B) — SB)p — m + 1) 


S®B)(n — p — 1) > fi-d(p—m+i,n—p—J (15.4.21) 


For additional information on these distributional results, see Scheffe (1959). We 
summarize the above remarks in the following theorem. 


Theorem 15.4.7 Assume the conditions of Theorem 15.4.4 and let S(p) = (Y — XB)'(Y — XB). 


1. Ay x 100% confidence region for By, By, ..., B, is given by the set of 
solutions to the inequality 


56) <s6]1+ +4 fo+1.n-p-0| 
n—-p-1l 


2. Asize « test of Hy: f,, ='': = B, = 0, where 0 < m < p, would reject Ho 
if 


[S(Bo) — S@)IMp ~ m + 1) 
SByn — p= 1 


where f = (X'X)"1X’Y and 6? = S(®)/n are the MLEs over the full 


>ipclp emia Pp) 


parameter space Q = {By, By, ..., Bp, 0”): —% <B; < ©, 0” > 0}, and 
B, and 62 are the MLEs over the subset 2) of Q such that B; = 0 for 
j=m,..., p. Furthermore, By = (X5 Xo)” ty! Y, where X, is the n x m 


matrix consisting of the first m columns of X, 62 = S(B,)/n, and the 
resulting test is equivalent to the GLR test. 


Example 15.4.4 In Example 15.4.2 we considered a GLR test using the auto emissions data. 


There we used the chi-square approximation of —2 In A(y) to carry out the test. 
Now we can perform an exact test using the results of Theorem 15.4.7. Recall that 
it was desired to test the hypothesis H,: 8, = 8, =0 versus the alternative that 
at least one of these coefficients is nonzero. Over the subspace Qo there is only 
one undetermined coefficient By with MLE fy, =) over Qo, and in this case 
S(Bo) = Y. (vi — 9)? = 0.00796, while over the unrestricted space 2 the MLE is 
§ = (X'X)71X’Y and S(B) = 0.00175. Because n = 11, p = 2, and m= 1 in this 
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[0.00796 —.0.00175]/2 
0.00175/8 
would reject Hy at the a = .05 level because f.,(2, 8) = 4.46. As in the test of 
Example 15.4.2, evidence in favor of using a regression model is overwhelming 
because the p-value is .0023. It is interesting to note that both procedures led to a 

rejection, and in both cases the result was highly significant. 


example, the value of the F statistic is = 14,194, which 


15.5 


ANALYSIS OF BIVARIATE DATA 


To this point, we have. assumed that the variable x can be fixed or measured 
without error by the experimenter. However, there are situations in which both X 
and Y are random variables. Assume (X,, Y,), ...,(X,, Y,) is a random sample 
from a bivariate population with pdf f(x, y). That is, each pair has the same joint 
pdf, and there is independence between pairs, but the variables within a pair may 
be dependent. 

Our. purpose in this section is to show. how the results about the simple linear 
model can be used in the analysis of data from bivariate distributions and to 
develop methods for testing the hypothesis that X and Y are independent. We 
first define an estimator of the correlation coefficient p. 


Definition 75.5.9 


If (X,, Y,), ..., (X,, ¥,) is a random sample from a bivariate population, then the 
sample correlation coefficient is 


Y&R - 7 


R= (18.5.1) 


va -¥ YH 


The corresponding quantity computed from paired data (x, V1), ...) (%2),) 
denoted by r, is an estimate of p. 


We will consider the important special case in which the paired data are obser- 
vations of a random sample from a bivariate normal population, (X, Y) ~ 
BVN(i;, 2, 02, 03, p). We know from Theorem 5.4.8 that the conditional dis- 
tribution of Y given X = x is normal, 


o 
Y|x~ N( i +p = (x — 4), 03(1 -»*)) 
1 
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This can be related to a simple linear model because E(Y|x) = 8, + 8.x with 
Bo =-2—p ie Bi =p ae, and Var(Y|x) = 03(1 —p’). Thus, we have the 
1 1 


> 


following theorem 


Consider a random sample from a bivariate normal population with means 
Hy, and yw, variances oj and o3, and correlation coefficient p, and let X 
=(X,,...,X,) and x =(x,,...,x,). Then, conditional on ¥ = x, the variables 
7,...,¥, are distributed as independent variables Y,~ N(Bo + f1x;, 07) 


r o oO 
with By =H. —p — M1, Bi =p 7, and o? = o5(1 — p”). 
Oy O74 


If it is desired to test the hypothesis H, : p = 0, the fact that 8, = p es suggests 
oy 


using the test of Hy: 8, =0 in Part 2 of Theorem 15.3.7. That is, reject H, at 
level o if |t,| > t,~,/2(n — 2). Of course, the resulting test is a conditional test, but 
a conditional test of size « also gives an unconditional test.of size a. 

The following derivation shows that an equivalent test can be stated in terms 
of the sample correlation coefficient +. Specifically, if 


t=./> (x; — x)6,/e (15.5.2) 


then under H,:p = 0 and conditional on X¥ = x, t is the observed value of a 
random variable T that is distributed the same as 7,. In other words, under 
Hy): p =0 and, conditional on the observed x,’s, the variable T ~ t(n — 2). But 
the MLE of f, for the simple linear model is 


b, ~ >» (x; — x)y; 


> (x; — x)? 
ae x (x; — X)(¥i — Y) 
¥ (Gr x) 
EE) Oy = O01 = 9») Yi — 9? 


. yO -— xP Yn - 5H? YG)? 


ae ore = a (15.5.3) 
x;—-x 
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where r is the estimate (15.5.1). Recall that for the centered: regression problem, 
B. = y, and thus 


(n — 2)6? = ‘- [yi — Bo — B yx? 
=) [vy -¥- By, — YP 
LO WAY Gn 


Sh 5 zr | 
YO; | 1 SG pF 


It follows that 


Loy 
ae n—2 


(15.5.4) 


The last expression is based on substituting the right side of equation (15.5.3) for 
B, and some simplification. By substituting equations (15.5.3) and (15.5.4) into 
(15.5.2), we obtain 


_ VE = P/E i = WY Oi — 
YO — PLL = 17m — 2) 
/n—-2r 


a 15.5.5 
at ( ) 
It follows: thata-size «test of H,:p.= 0 versus H,.p # 0 is to reject Ho if 
|t] > ty—g2(n — 2) where t =./n — 2r/,/1 —r’. This provides a convenient test 
for independence of X and Y because bivariate normal random variables are 
independent if and only if they are uncorrelated. These results are summarized in 
the following theorem. 


Theorem 15.5.2 Assume that (X;, Y,), .-.,(X,, ¥,) is a.random sample from a bivariate normal 
distribution and let t = ./n — 2r/,/1 — r?. 


1. A size a test of the hypothesis Hy :p =.0 versus H,: p # Ois to reject if 
[| 2 ty-g/a(n — 2). 
2. A size a test of the hypothesis Hy: p < 0 versus H,: p > Ois to reject if 
2 ty -,(n — 2). 
3. A size « test of the hypothesis Hy: p > 0 versus H,: p < Ois to reject if 
t<—t,_,(n — 2). Ed 


It is interesting to note that for the above conditional distribution of T to hold, 
it is only necessary to assume that the 2n variables X,,..., X, and Y,,..., Y, are 
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independent and that each Y, ~ N(w, o*) for some yu and o*. Thus, the test 
described above is appropriate. for testing independence of X and Y, even if the 
distribution of X is arbitrary. This is summarized by the following theorem. 


Theorem 15.5.3 Assume that (X,, Y,), ..., (X,, Y,) is a random sample from a bivariate popu- 
ation with pdf f(x, y). if Yn ~ N(u, o?) for each i= 1,...,n, then a size « test of 
the null hypothesis that the variables X, and Y; are independent is to sas if 


|t| > ty -4/2(n — 2) where t = \/n ort —r 


The test defined in Theorem 15.5.2 does not extend readily to testing other 
values of p except p = 0. However, tests for nonzero values of p can be based on 
the following theorem, which is stated without proof. 


Theorem 15.8.4 Assume that (X,, Y,),...,(X,, Y,) is a random sample from a bivariate normal 
distribution, BVN(u,, #2, 97, 63, p), and define 


1 1+R 1 1+p 
pees 1 ee ial 
Vv zn (+3) m sin(+*4) 


Then Z = ./n — 3(V — m) > N(O, 1) asn> 00. 


That is, for large n, V is approximately normal with mean m and variance 
1/(n — 3). This provides an obvious approach for testing hypotheses about p, and 
such tests are given in the following theorem. 


Theorem 15.5.5 If (X,, Y,), ae as yi is a random sample from a bivariate normal distribution, 
BVN(u,, 2, 92, 63, p), and z) =./n — 3(v— mo) with v = (1/2) In {a + r/ 
(1 —r)] and mg = (1/2) In [(1 + po)M(1 — o)], then 
1. An approximate size « test of Hy : p = Po versus H,: p # Po is to reject 
Ag if |Zo| > 21 -a/2- 
2. An approximate size « test of Hy : p < Po versus H,: p> Po is to reject 
Ho if 29 2 21-4: 
3. An approximate size « test Of Hy: p > Po versus H,: p < Po is to reject 
Ho if 29 < —2 1-4: a 


It also is possible to construct confidence intervals for p based on this approx- 
imation. For example, the approximate normal variable Z can be used to derive a 
confidence interval for m of the form 


(my, m2) =(v— 2, ~aj2// n—-3,0+ 24 ~a/2// Nee3) 


Example 15.5.7 
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Limits for a confidence interval for p are obtained by solving equations 
m, = 1/2 In [1 + p)1 — pj] for i= 1,2. The resulting confidence interval is 
of the form (p,, 22) where p; = [exp (2m,) — 1]/[exp (2m,) + 1] for i = 1, 2. 


Consider the auto emissions data of Example 15.3.1. Although, strictly speaking, 
the variable x is not random in this example, we will use it to illustrate the 
computation of r and a test of hypotheses. Recall that n= 11, }° x; = 304.377, 
Y x? = 10461.814, Y y, = 3.4075, ¥ y? = 1.063, Y. x;y, = 97.506, X = 27.671, 
and j=0.310. Thus, }) (x,— x)? =} x? — nx? = 2039.287,  ¥ (y; — 5)? 
=) y? — ny? = 0.0059, and ¥' (x; — xy; — ¥) = ¥ x,y; — nx = 3.1479. It follows 
that r = 3.1479/,/(2039.287)(0.0059) = 0.908. A test of Hy: p= 0 versus H,:p 
#0 based on t = ./9 (0.908)/./1 — (0.908)? = 6.502, which, of course, would 
reject at any practical level of significance, indicating a near linear relationship 
among the variables. 


In Chapter 12, a paired-sample t test was discussed for testing the difference of 
means for a bivariate normal distribution with the variances and the correlation 
coefficient unknown nuisance parameters. We now consider the problem of a 
simultaneous test of equality of the means and variances ofa bivariate normal 
population with unknown correlation coefficient. This test was suggested by 
Bradley and Blackwood (1989). Suppose X and Y are bivariate normal, 
(X, Y) ~ BVN(y, 42, 07, 03, p). It follows from the results of Chapter 5 that 
the sum S$ = X + Y¥ and difference D = X — Y also are bivariate normal with 
means) Us = fy +2 and Up =H, — Wy, Variances of =07+03+2p and 
o}, = 07 + 03 — 2p and covariance 


Cov(S, D) = Cov(X + Y, X — Y) 
= Var(X) + Cov(X, Y) — Cov(X, Y) — Var(Y) 
=ot-a3 | 


Thus, the correlation coefficient of S and D is 


oj — 9} 
Psp = 
osTp 


As.a consequence of Theorem 5.4.8, we know that the conditional distribution of 
D given S. = sis normal, 


to) 
Dis~ NH + Psp e (s — us), o5(1 — pis) 


Notice that if we let 


2 2 
Oo, — 03 

2 (44 + H2) 
os 


a 
Bo = Hp — Psp — Ms = (Hi = Ha) = 
Gs 
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and 


oi — a3 


B= ae 
then, conditional on § =s, D is normal with mean. E(D|s) = Bp) +f4s and 
variance Var (D|s) = o3(1 — p2p). Thus, conditionally we have a simple linear 
regression model. Because “4, = , and o? = a3 if and only if By = £, = 0, the 
joint null hypothesis Hy : uy = #2, 07 = G3 is equivalent to the joint null hypoth- 
esis Ho: Bo = 8, =0, which can be tested easily with the results of Theorem 
15.4.7. In particular, if s;,.., 5, and d,,...,d, are the sums and differences of n 
pairs (x;, y,) based on a random sample from a bivariate normal population, then 
a size « test of Hy would reject if 


CX 4? — SSEJ/2 _ [S(0) — s(6)]/2 
SSE/(n — 2) S@/(n — 2) 
1 wee 1 


Sy ott Sy 

It also is possible to test the equality of variances because oj = a3 if and only if 
B, =0. Theorem 15.4.7. yields a test of Hy: a7 = 032 versus the alternative 
H,: 07 #3, namely, we reject Ho if 


Yd —d) —SSE _ S(Bo) ~ SB) 
SSE(n— 2) S@/(n — 2) 


with §, =d over the parameter subspace with 8, = 0 and B the unrestricted ML 
solution of the previous test. 


>Si-a2, n — 2) 


with B = (X’X)"1X’Y, X= ( ) and Y'=(d;,\.::, d,). 


>hi 21; nt 2) 


SUMMARY 


Many problems in statistics involve modeling the relationship between a variable 
Y, which is observed in an experiment, and one or more variables x, which the 
experimenter assumes can be controlled or measured without error. We have 
considered the approach of linear regression analysis, which assumes that Y can 
be represented as a function that is linear in the coefficients, plus an error term 
whose expectation is zero. It was shown that estimates with minimum variance 
among the class on linear unbiased estimates could be obtained with the mild 
assumption that the errors are uncorrelated with equal variances. With the addi- 
tional assumption that the errors are independent normal, it also was possible to 
obtain confidence limits and tests of hypotheses. about the parameters of the 
model. We have only scratched the surface of the general problem of regression 
analysis. For additional reading, the book by Draper and Smith (1981) is recom- 
mended. 
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EXERCISES 


In a study of the effect of thermal pollution on fish, the proportion of a certain variety of 
sunfish surviving a fixed level of thermal pollution was determined by Matis and Wehrly 
(1979) for various exposure times. The following paired data were reported on scaled time 
(x) versus proportion surviving (y). 


x: 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55 
y: 1.00, 0.95, 0.95, 0.90, 0.85, 0.70, 0.65, 0.60, 0.55, 0.40 


(a) Plot the paired data as points in an x—y coordinate system as in Figure 15.1. 

{b) Assuming a simple linear model, compute the LS estimates of f, and f,. 

(c) Estimate E(Y,) = Bo + B,x if the exposure time is x = 0.325 units. 
Include.the graph of } = $, +8 1x with the plotted data from (a). 

(d) Compute -the SSE: and give an unbiased estimate of o? = Var(Y,). 


Mullet (1977) considers the goals scored per game by the teams. in the National Hockey 
League. The average number of goals scored per game at home and away by each team in 
the 1973-74 season was: 


At Home Away 


Boston 4.95 4.00 
Montreal 4.10 3.41 
N.Y. Rangers 4.26 3.44 
Toronto 3.69 3.33 
Buffalo 3.64 2.56 
Detroit 4.36 2:18 
Vancouver 3.08 2.67 
N.Y. Islanders 2.46 2.21 
Philadelphia 3.90 3.10 
Chicago 3.64 3.33 
Los.Angeles 3.36 2.62 
Atlanta 3.10 2.38 
Pittsburgh 3.18 3.03 
St. Louis 3.08 2.21 
Minnesota 3.69 2.33 
California 2.87 - 2.13 


(a). Plot the paired data in.an x-y coordinate system with x the average number of 
goals at home and y'the average number of goals away. 


(b) Assuming a simple linear model, compute the LS estimates of f, and f,. 


{c) Predict the average number of away goals per game scored by a team that scored 
four goals per game at home. 


(d) Include the graph of the estimated regression function with the plotted data from 
(a). 


(e) Compute an unbiased estimate of o?. 
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Rework Exercise 1 after centering the x;'s about x. That is, first replace each x, with x; — x. 


Rework Exercise 2 after centering the x,'s about x. 


Show that the error sum of squares can be written as 
SSE =Yy¥-Bo Ly-A Y xy; 


Show each of the following: 
(a) Residuals can be written as @; = y,; — ¥ — By(x; — X). 
(b) SSE =) (yi — 9)? — BEY (i: — 9), 


For the constants a;, ..., a, in part 4 of Theorem ‘15.3.1, show that 
¥ a; =0 and } a,x; = 0 imply ¥ a;b; = 0. 


Assume-Y,,..., Y, are uncorrelated random variables with E(Y,) = By + B,x,; and 
Var(Y,) = o?, and that fy and f, are the LS estimates of By and £;. Show each of the 
following: 


2 
(a) E(B2) = 2 + [1 + n82/¥ (x, — 8]. 
n 


Hint: Use general properties of variance and the results of Theorem 15.3.1. 
(b) E(B) = Bi + 07/E (x, — 8). 
(©) ELL (%— YP} =(— eo? + 67 E @;— 9). 
Hint: Use an argument similar to the proof of Theorem 8.2.2. 
(d) 62 =¥ LY, — Bo — B,x,]?An — 2) is an unbiased estimate of o?. 
Hint: Use (b) and (c) together with Exercise 6. 


Consider the bus motor failure data in Exercise 15 of Chapter 13. 

(a) Assume that the mileages for the first set of data are normally distributed with mean 
wand variance a?. Apply the method described in Example 15.3.2, with x; = a, and 
y, = ®(F;) to estimate pw and o?. 

(b) For the fifth set of bus motor failure data, assume that the mileages have a 
two-parameter exponential distribution with location parameter 7 and scale 
parameter 9. Apply the method described in Example 15.3.2 with x; = a, and 
y= G"'(F)). 


Rework (a) from Exercise 9 but use the method in which x; = ®71(F)) and y, = a,. 


Verify that the MLEs f, , 8,, and 6? for the centered regression model of Section 3 are 
jointly complete and sufficient for 8, , B,, and o?. 


Verify the identities that follow equations (15.3.6) and (15.3.7). 
Hint: Note that ¥ (x, — ¥) =0 and ¥ (x; — X)bj = 1. 


Derive equation (15.3.8). Hint: Add and subtract £. + 6,(x, — x) within the squared 
brackets, then use the binomial expansion. The term involving (6, — 8.8, — B,) is zero 
because ° (x, — x) = 0. 
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714. Derive the joint MGF of f, and #, under the assumptions of Theorem 15.3.4. Hint: Use 
the fact that #, and f, are independent normally distributed random variables and that 
Bo = B, — Bx. Recall that the MGF of bivariate normal random variables is given in 
Example 5.5.2. 


75. For the thermal pollution data of Exercise 1, assume that errors are independent and 
normally distributed, e, ~ N(0, ?). 
(a) Compute 95% confidence limits for By. 
(b) Compute 95% confidence limits for f,. 
(c) Compute 95% confidence limits for a. 
(d) Performa size « = .01 test of Hp: By = 1. versus H,: Bo #1. 
(ec) Perform a size a = 10 test of Hy : 8, = —1.5 versus H,:f, # —1.5. 
(f) Perform a size a = .10 test of Hy :¢ = .05 versus H,: 0 # .05, 


76. Under the assumptions of Theorem 15.3.7, 
(a) Derive both upper-and lower one-sided tests of size « for By: 
(b) Redo (a) for f,. 
(c) Redo (a) for a”. 


77. Let ¥,,..., ¥, be independent where Y, ~. N(6x;, o”) with both f and o? unknown. 

(a) If y,,..., y, are observed, derive the MLEs f and 6? based on the pairs (x1, yi), ..., 
(%n> Vn): 

(b) Show that the estimator f is normally distributed. What are E(f) and Var(f)? 

(c) Show that the estimators # and 6? are independent. 

(d) Find an unbiased estimator &? of ¢? and constants, say cand v, such that 
ca? ~ x7(¥). 

(e) Finda pivotal quantity for £. 

(f) Derive a (1 — «)100% confidence interval for f. 

(g) Derive a {1 — «)100% confidence interval for o. 


78. In atest to.determine the.static stiffness of major league baseballs, each of six balls was 
subjected to a different amount of force x (in pounds), and the resulting displacement y (in 
inches) was measured. The data are given as follows: 


XS 10-. 20.30. 40. 50. 60 


y)3 045 .071..070°.172 .120 ..131 
Assuming the regression model of Exercise 17, 
(a) Compute the MLEs f and 6?. 
(b) Compute an unbiased estimate of o?: 
(c) Compute a 95% confidence interval for f. 


(d) Compute a 90% confidence interval for o?. 


79. Assume independent ¥, ~ EXP(fx,);i=1,..., n. 
(a) Find the LS estimator of £. 
(b) Find the MLE of £. 
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Apply the regression model: of Exercise 19 to the baseball data of Exercise 18 and obtain 
the MLE of £. 


Assume that Y;,..., Y, are independent Poisson-distributed random variables, 
Y, ~ POI(Ax)). 
(a) Find the LS estimator of 4. 
(b) Find the ML estimator of 2. 
(c) Are both the LS and ML estimators unbiased for A? 
(d) Find the variances of both estimators. 
Assume that Y,,..:, Y, are uncorrelated random variables with means E(Y;) = By + 8.x; 
and let w,, ...,.w, be known positive constants. 
(a) Derive the solutions By = fy and f, =f, that minimize the sum S = 
Y wily; — Bo — 81x)’. The solutions f, and f, are called weighted least squares 
estimates. 
(b) Show:that if Var(¥) = 0?/w; for each i= 1,..., n, and'some unknown a? > 0, then 
B, and f, are the BLUEs of 8, and f,. 
Hint: Use Theorem 15.4.2. 
Let.X,,...,-X, be a random sample from EXP(@, ), and let Z; = (X,— )/6. 


! 1 
Show that E(Z,_.) = —— 
(a) Show that E(Zi) = YG 


am 
(b) Show that a,, = Cov(Z,,,, Z;.,.) = > where.m = min (i, j). 


i (n—k + 1)?” 

(c) Show that (471), =(-i4+ 1? +@—-—=1,., (AY H(A Die 
= —(n—i)?,i=1,...,n—1, and all other elements equal zero. 

(d) Show that the BLUEs, based on order statistics, are 6=n(X—X tin/(n — 1) and 
ft = Xi — (KX — X40 = 1). 

(e) Compare the estimators in (d) to the MLEs. 


Assume that the data of Example 4.6.3 are the observed values of order statistics for a 
random sample from EXP(6, 7). Use the results of Exercise 23 to compute. the BLUEs of 8 
and 7. 


Under the assumptions of Theorem 15.4.3, verify equation (15.4.11). 
Hint: Note that (Y — Xp)(¥Y — XB) = ¥'(¥ — Xp) + #LX’¥ — (X’X)B] and make use of 
equation (15.4.7). 


Assume that (X,, X,) ~ BVN(4,, #2, 93, 03, p). Recall the joint MGF M(t,, t,)is given in 
Example 5.5.2. Show that if p! = (4, 42), = (t1, t2).and V = {Cov (X,, X,)}, that the 
joint MGF can be written M(t,, t.) = exp [e'p + (1/2) Ve]. 


Using advanced matrix theory, it can be shown that the joint MGF of a vector of k-variate 
normal random variables, X’ = (X,,..., X,), has the form given in Exercise 26, namely 
M (ty, ...5 t,) = exp [t/p + (1/2) Ve] with p’ = (u,,..., 2) and ¢ = (t,,..., t,). Assuming 
this result, show the following: 

(a) The marginal distribution of each component is normal, X;~ N(y;, 27). 


28. 


29. 


37, 


32. 


35. 
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(b) Ifa’ = (a,,...,a,) and 6’ = (b,,..., 6) and U =a'X and W = 5’X, then U and W 
have a bivariate normal distribution. 
(c) U and W are independent if and only if a’ Vé = 0. 


Using only the first eight pairs of hydrocarbon emission data of Example 15.3.1: 

(a) Compute the LS estimates of Bo, 8,, and £, for the second-degree polynomial 
model; also compute the unbiased estimate 67. 

(b) Express the regression function } based on the smaller set of data and sketch its 
graph. Compare this to the function }, which was based on all n = 11 pairs. Does it 
make sense in either case to predict hydrocarbon emissions past the range of the 
data, say for x = 60,000 miles? 

(c) Assuming the conditions of Theorem 15.4.3, compute.a'95% confidence interval for 
By based on the estimate from (a). 

(d) Repeat (c) for f,. 

(e) Repeat (c) for B,. 

(f) Compute a 95% confidence interval for o?. 


Verify equation (15.4.15). 

Hint: In the formula for S(f), add and subtract XB in the expression Y ~ §X and then 
simplify. 

Sketch the confidence contour for the coefficients 8 and £, in the simple linear model 
using the baseball data of Exercise 18. Use y= .95. 


Test the hypothesis Hy : 8, = B, = 0 concerning the coefficients of the second-degree 
polynomial model in Exercise 28. Use a = 0.005. 


Compute the sample correlation coefficient for the hockey goal data of Exercise 2, with 
x; = average goals at home, and y; = average goals away for the ith team. 


Under the assumptions of Theorem 15.5.2 and using the data of Exercise 2, perform a size 
a = 10 test of Hy: p = 0 versus H,: p # 0. Note: This assumes that the pairs (X;, Y,) are 
identically distributed from one team to the next, which is questionable, but we will 
assume this for the sake of the problem. 


Using the hockey goal data, and assuming bivariate normality, construct 95% confidence 
limits for p. 
For the hockey goal data, assuming bivariate normality, test each of the following 
hypotheses at level « = .05: 

(a) Ho: Hy =H, and of = 03 versus H,:y, # pu, or 0? #03. 

(b) Ho: aj = 03 versus H,:07 4 03.- 


RELIABILITY 
AND SURVIVAL 
DISTRIBUTIONS 


16.1 


INTRODUCTION 
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Many important statistical applications occur in the area of reliability and life 
testing. If the random variable X represents the lifetime or time to failure of a 
unit, then X will assume only nonnegative values. Thus, distributions such as the 
Weibull, gamma, exponential, and lognormal distributions are of particular inter- 
est in this area. The Weibull distribution is a rather flexible two-parameter 
model, and it has become the most important model in this area. One possible 
theoretical justification for this in certain cases is that it is a limiting extreme- 
value distribution. 

One aspect of life testing that is not so common in other areas is that of 
censored sampling. If a random sample of n items are placed on life test, then the 
first observed failure time is automatically the smallest order statistic, x,,,. Simi- 
larly, the second recorded failure time is x,.,, and so on. If the experiment is 
terminated after the first r ordered observations are obtained, then this is referred 
to as Type II censored sampling on the right. If for some reason the first s 
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ordered. observations are not available, then this is referred to as Type II cen- 
sored sampling on the left. If the experiment is terminated after a fixed time, xo, 
then this is known as Type I censored sampling, or sometimes truncated sam- 
pling. If all n ordered observations are obtained, then this is called complete 
sampling. 

Because the observations are naturally ordered, all the information is not lost 
for the censored items. It is known that items censored on the right have survived 
at least until time x9. Also, a great.savings in time may result from censoring. If 
100 light bulbs are placed on life test, then the first 50 may fail in one year, 
whereas it may take 20 years for.the last one to fail. Similarly, if 50 light bulbs are 
placed on test, then it may take 10, 15, or 20 years for all 50 to fail, yet the first 50 
failure times from a sample size 100 obtained in one year may contain as much 
information in some-cases as the 50 failure times from a complete sample of size 
50. The expected length of experiment required to obtain the first r ordered 
observations from a sample of size n is E(X,,,,,). These values can be compared for 
different values of r and n for different distributions. 

If a complete random sample is available, then statistical techniques can be 
expressed in terms of either the random sample or the associated order statistics. 
However, if a censored sample is used, then the statistical techniques and dis- 
tributional results must be developed in terms of the order statistics. 


16.2 


RELIABILITY CONCEPTS 


Ifa random variable.X represents the lifetime or time to failure of a unit, then the 
reliability of the unit at time t is defined to be 


R(t) = P[X > t]=1-—F,() (16.2.1) 


The same function, with the notation S(x) = 1— F,(x), is called the survivor 
function in biomedical applications. 

Properties of a distribution that we previously studied, such as the mean and 
variance, remain important in the reliability area, but an additional property that 
is quite useful is the hazard function (HF) or failure-rate function. The hazard 
function, A(x), for a’ pdf is defined to be 


__ Jf) _ =R) _ —dflog R(X] 
h(x) = 1-F®) Re es (16.2.2) 
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The HF may be interpreted as the instantaneous failure rate, or the conditional 
density of failure at time x, given that the unit has survived until time x, 


S(x|X 2 x) = F(x| X 2 x) 
a F(x + Ax|X > x) — F(x|X > x) 


= li 
Ax70 Ax 
. Phe <x X <x t+ Ax|X > x] 
= lim 
Ax+0 Ax 
. Pix <X <x +Ax, X > x] 
= lim 
Ax-+0 Ax P[X 2 x] 


ll 


a Pix < X <x + Ax] 
ax-o AX [1 — F(x)] 


eel OO 
4 — F(x) 


= h(x) (16.2.3) 


An increasing HF at time x indicates that the unit is more likely to fail in the 
next increment of time (x, x + Ax) than it would be in an earlier interval of the 
same length. That is, the unit is wearing out or deteriorating with age. Similarly, 
a decreasing HF means that the unit is improving with age. A constant hazard 
function occurs for the exponential distribution, and it reflects the no-memory 
property of that distribution mentioned earlier. 

if X ~ EXP(6), 


ee Pa Le) 


In this case the failure rate is the reciprocal of the mean time to failure, and it 
does not depend on the age of the unit. This assumption may be reasonable for 
certain types of electrical components, but it would tend not to be true for 
mechanical components. However, the no-wearout assumption may be reason- 
able over some restricted time span. The exponential distribution has been an 
important model in the life-testing area, partly because of its simplicity. The 
Weibull distribution is a generalization of the exponential distribution, and it is 
much more flexible. : 


If X ~ WEI(6, A), then 
_ BO Axo Lele)? 
a e710)? 
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FIGURE 16.1 Weibull HFs 


A(t) = (8/8) (1/8)%~! 


This reduces to the exponential case for 6 = 1. For B > 1, the Weibull HF is an 
increasing function of x; and for f < 1, the HF is a decreasing function of x. The 
Weibull HF is illustrated in Figure 16.1. 
The gamma distribution also is an important model in life testing. It is not easy 
to express its HF, but if X ~.GAM(O, k), then the HF is increasing for k > 1 and 
’ decreasing for k <1. For k>1 the HF approaches 1/@ asymptotically from 
below, while for k < 1, the HF approaches 1/8 asymptotically from above. This is 
substantially different from the Weibull distribution, where the HF approaches 
co or 0 in these cases. The HF for a lognormal distribution is hump-shaped. 
Although the pdf’s in these cases appear quite similar, they clearly have some- 
what different characteristics as life-testing distributions; the HF is a very mean- 
ingful property for distinguishing between these densities. Indeed, specifying a HF 
completely determines the CDF and vice versa. 


Theorem 16.2.1 For any HF h(x), the associated CDF is determined by the relationship 


F(x) = 1 — exp |- [wo a| (16.2.4) 
0 
or 


I (x) = (x) exp | - [0 a| (16.2.5) 
0 
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Proof 


This result follows because 


h(x) = - is R(x)] 
and 
| “0) dt = [-< log R(t) = —log R(x) 
0 0 
which gives 


R(x) = 1 — F(x) = exp | — { “W(t) a (16.2.6) 
0 


Note that a function must satisfy certain properties to be a HF. 


Theerem 16.2.2 A function h(x) is a HF if and only if it satisfies the following properties: 


h(x) > 0, for all x (16.2.7) 


{ h(x) dx =.00 (16.2.8) 
Fe : 


Proof 


The properties are necessary because 
IG). 
1— F(x) 
and 


[Pro dx = [ates R(x)] =:—log R(x) |? =.00 
o 0 


The properties are sufficient because the resulting F(x} will be a valid CDF; 
that is, in terms of h(x), 
0 


F(— co) = FQ) = 1 — exp | h(t) a| =0 


0 


F(co) = 1 — exp |- [7m a| =] 
0 


and F(x) is an increasing function of x because {3 A(¢) dt is an increasing functio 
of x. : 
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One typical life-testing form of HF is a U shaped or bathtub shape. For 
example, a unit may have a fairly high failure rate when it is first put into oper- 
ation, because of the presence of manufacturing defects. If the unit survives the 
early period, then a nearly constant HF may apply for some period, where the 
causes of failure occur “at random.” Later on, the failure rate may begin to 
increase as wearout or old age becomes a factor. In life sciences, such early fail- 
ures correspond to the “infant mortality” effect. 

Unfortunately, none of the common standard distributions will accommodate 
a U-shaped HF. Of course, following Theorem 16.2.1, an F(x) can be derived that 
has a specified U-shaped HF. Quite often it is possible to consider the analysis 
after some “burn-in” period has taken place, and then the more common dis- 
tributions are suitable. The exponential distribution is used extensively in this 
area because of its simplicity. Although its applicability is somewhat limited by 
its constant HF, it may often be useful over a limited time span as suggested 
above, and it is convenient for illustrating many of the concepts and techniques 
applicable to life testing. Also the homogeneous Poisson process assumes that the 
times between occurrences of failures are independent exponential variables, as 
we shall see in Section 16.5. 

The preceding discussion refers to the time to failure of a nonrepairable system 
or the time to first failure of a repairable system. The times between failures of a 
repairable system often would be related to more general stochastic processes. 
Note that the HF of a density should not be confused with the failure rate or 
failure intensity of a stochastic process, although there is a connection between 
the two for the case of a nonhomogeneous Poisson process. In that case the HF 
of the time to first failure is also the failure intensity of the process, although its 
interpretation is somewhat different depending on whether you are concerned 
with the continuing process or only the first failure. 


PARALLEL AND SERIES SYSTEMS 


Redundancies may be introduced into a system to increase its reliability. If X;, 
i=1,..., m, denotes the lifetimes of m independent components connected in 
parallel, then the lifetime of the system, Y, is the maximum of the individual 
components, 


Y = max (X;) = Xm 
and 
Y~ Fyly)= Il Fy{y) 
ist 
The distributions of maximums in general are not very convenient to work 


with, but the reliability of the system for a fixed time at least can be expressed. 
The reliability of the parallel system at time t, Ry(#), in terms of the reliabilities of 
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the components, R,(t), is given by 


Ry) =1-— [] 1 - RO) (16.2.9) 
: i=1 
If P; is the probability that the ith component functions properly, then the prob- 
ability that the parallel system of independent components functions properly is 


P=i1—[|(1-—P) (16.2.10) 
i=1 
For example, if X; ~ EXP(@,), then the reliability of the system at time t is 
Rd) =1-— [] d-—e (16.2.11) 
ist 


If the components all have a common mean, 0; = 9, then it can be shown that 
the mean time to failure of the system is 


E(Y) = EX in) = (: + ; ttt (18.2.12) 
Thus.the mean time to. failure increases as each additional parallel component is 
added; however, the relative gain decreases as.m increases. 

To. illustrate. the -effect-on the HF from using a parallel system, consider a 
parallel. system. of. two.independent. identically. distributed components, X, ~ 
F(x). The HF for the system hy(y), in terms of the HF; hy(x), of each component 
is 


_ Fry) 
hy) = FQ) 


= 2FA)F xy) 
1—-[F,Q)/ 
a | 2F x(y) 


h : (16.2.13 


For positive continuous variables, the term in brackets.goes from 0 to 1 as y goes 
from 0 to oo. The failure rate of the parallel system is always less than the failure 
rate of the individual components, but it approaches the failure rate of an indi- 
vidual component as y > oo. 

The HF of a system connected in series is somewhat easier to express. If X; 
denotes the failure times of m independent components connected in series, then 
the failure time of the system, W, is the minimum of those of the individual 
components, 


W= Xian 
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In this case 
F,(w) = P[LW < w] 
=1-—P[W>w] 
=1— Pfall X;> w] 


=1-[] i- Fy(w)] (16.2.14) 
l=1 
In terms of the reliability of the system at time w, 
Ry(w) = [] Rw) (16.2.15) 
t=1 


If one is simply interested in the proper functioning of some sort, then for the 
series system 


P.={J.P, (16.2.16) 
where P, is the probability of proper functioning of each component. Ciearly the 


reliability of the system decreases as more components are added. 
In terms of HFs, for the ith component, 


R{x) = exp |- | “nl2) iz| 
Oo 


R,ylw) = 1-—F yw) = a Rw) 
i=t 


= exp [-| hz) ae| 
0 i=1 


= exp |-| hy(z) iz| (16.2.17) 
0 
thus 
hy(w) = > hw) (16.2.18) 
i=1 


That is, the HF of the system is the sum of the HFs of the individual components. 
If X; ~ EXP(0)), then 


hylw) = 2D hiw)= 2g =e (16.2.19) 


Because h,{w) is constant, this implies that 


W ~ EXP(1/c) 
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If 6, = 0, then 


E(W) = @/m (16.2.20) 


16.3 


EXPONENTIAL DISTRIBUTION 


COMPLETE SAMPLES 


If X ~ EXP(6), then we know that X is the MLE and the UMVUE of 8, the 
mean of the random variable. The MLE of.the reliability at time t is 


Rt) =e =e? (16.3.1) 
The MLE of R(t) is not unbiased, and it can be verified that the UMVUE is 
given by 
. 1 = t((nx)]""4 Xt 
R(t) = i in) n (16.3.2) 
0 nx <t 


The MLE does have smaller MSE except when t/(nx) is relatively large or close 
to zero. 

Tests of hypotheses about:@, or monotonic functions of @ such as the reliability 
or the HF, are carried out easily based on the property that 


nx 
ape 7 (2n) 


We already have seen that a one-sided confidence interval on a percentile is also 
a one-sided tolerance interval. There also is a close connection between tolerance 
limits and confidence limits on reliability. For a lower (, p) tolerance limit, 
L{X; y, p), the content p is fixed and the lower limit is a random variable. For a 
lower confidence limit on reliability, R,(t, y), the lower limit ¢ is fixed and the 
proportion R, is a random variable. However, for a given set of sample data, if 
one determines the value p* that results in 


Lex; 7, p*)=t 
then 
p* = Rift, 7) 


If p* is a random variable as defined above, and if R(t) = p and t = x,_,, then 
p > p* if and only if L(x; y, p) < L(x; y, p*) = t, because increasing the content 
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decreases the lower limit. Thus 


P[R(@) > p*] = Plp > p*] 


= PLL(X; y, p) < L(X; », p*)] 

= P[LL(X; », p) < t] 

= P[L(X; y, p) < xy~p] 

ay (16.3.3) 


For the exponential distribution 
X;-,= —@lnp 
and a y probability tolerance limit for proportion p is 
L(x, y, p) = — 9, In p 


where 


2nx 


= Zam (16.3.4) 


is the lower y level confidence limit for 0. 
To.obtain a lower confidence limit onthe reliability at time t, we may set 
t = L(x, y, p*) = —@, In p*, and 


Ry (t) = p* = et = exp [—172(2n)/2nx] (16.3.5) 


is a lower y level confidence limit for R(t). Of course, this could be obtained 
directly in this case because 


Rit) =e"? 

which is a monotonically increasing function of @, so 
y = Pl@, < 6] 

= Pe 1/6 

= P[R,(t) 


as before. 


e~ 10) 


R(t)] 


IN IK 


Consider the following 30 failure times.in flying hours of airplane air conditioners 
(Proschan, 1963): 


23, 261, 87, 7, 120, 14, 62, 47, 3, 95, 225, 71, 246, 21, 42, 20, 5, 12, 
120, 11, 14, 71, 11, 14, 11, 16, 90, 1, 16, 52 


The MLE of @ is § = X = 59.6. The MLE of the HF is A(x) = 1/6 = 0.017, and the 
MLE of the reliability at time ¢ = 20 hours is R(20) = e~ 259-6 = 0.715. A lower 
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0.95 confidence limit for @ is 


2nx ——-60(59.6) 


A 95% lower confidence limit for the reliability at t = 20 hours is 
Ry, = et = g~20/45.22 _ 9 643 


We are 95% confident that 64.3% of the air conditioners will last at least 20 
hours. If we are interested in 90% reliability, then we may set p = 0.90 and deter- 
mine the lower tolerance limit L(X) such that 


P[L(X) > 0.90] = 0.95 
We have 
L(X; 0.95, 0.90) = 6,(—In 0.90) = 45.22(0.105) = 4.75 


We are 95% confident that 90% of the air conditioners will last at least 4.75 
hours. 


Many of the results for complete samples can be extended to censored samples. 
We will make use of Theorems 6.5.4 and 6.5.5, with the notation X;,, representing 
the ith order statistic for a random sample of size n. 


TYPE il CENSORED SAMPLING 


The special properties of the exponential distribution make it a particularly con- 
venient model for analyzing censored data. The joint density function of the r 
smallest order statistics from a sample of size n is given by 


WXtiny 00s Xpn OS aT exp \-[ Xin + (n — "a fe} 


0< XxX, <''' <X,.,< 0 (16.3.8) 


A useful property of the exponential distribution is that differences of consecutive 
order statistics are distributed as independent exponential variables. 


Let ¥, = X,/6 ~ EXP(1), i=1,...,n, be n independent exponential variables, 
and let 

We = Yin Wr = Yan — Yiins «+ W, = Di Mids 
then 


1. W,,..., W, are independent. 
2. W,~ EXP(i/(n —i + 1)). 
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3. The kth order statistic may be expressed as a linear combination of 
k 
independent exponential variables, ¥,,, = ). Z jn —j + 1), where 
j= 


Z;~ EXP(1). 


k 
4. E¥n) =O W— jt D7 
jal 
Proof 
SOun> Deone | Vain) =n! exp [- 3S | 
i=l 


=n! exp |- x (n —i+ DVin ne ri) 


where yo,,, = 0. Consider the joint transformation 
Wi = Vin ~ Yi-tm i=1,....n 
with inverse transformation 
View = Wi Yan = Wy + W2,-.-5 Van = Wit TW, 


with Jacobian J = 1, This gives 
S(,, ..., w,) =n! exp |- ¥ m-i¢ | 
i=l 


n 
=[] @-i+ eC t™  0<w,< 00 
i=j 


Thus we recognize that the w; are independent exponential variables as stated in 
parts i and 2. Also, from the above transformation it follows that we may express 
Mien AS 

Yon = Wyte + (16.3.7) 
but Z;~ EXP(1) implies Z,/(n — i+ 1) ~ EXP(1/(n —i + 1), so part 3 cles 
and part 4 follows immediately from part 3. 


Let us now consider the problem of estimation and hypothesis testing for 
the TEE II censored sampling case. The MLE of @ based on the joint density, 
I(X1in> +++» Xrn 5 9), Of the first r order statistics is easily seen to be 


r 
x Xizn a (n ee r)%eon 


§ = =+—_______ (16.3.8) 
s 
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The properties of 6 for the censored sampling case are amazingly similar to those 
for the complete sample case. 


Let § denote the MLE based on the first r order statistics from a sample of size n 
from EXP(6). Then 

1. 6 is a complete, sufficient statistic for 0. 

2. 6 is the UMVUE of 8. 

3. 2r6/0 ~ 77(2r). 


Proof 

Because 9(Xiin5 +++» Xp_ 3 9) is a member of the (multivariate) exponential class, it 
follows that @ is a complete, sufficient statistic for 6. By uniqueness, 6 will be the 
UMVUE of 6 if it is unbiased. The unbiasedness of 6 will follow easily from part 
3. To verify part 3, note that 6 may be rearranged to obtain 


orb 2 py (n —it+ (Xin a fig) 


“o 6 
= 2 (n—i+t 1)W, 
i=t : 
a ars 
r=] 


where the W, are as defined in Theorem 16.3.1, and the Z, ~ EXP(1) are indepen- 
dent. Thus part 3 follows. 


Recall in the complete sample case that 


mn ~ x7(2n) (16.3.9) 


This is a very unusual situation in which the censored sampling results essentially 


- are identical to the complete sample results if n is replaced by r. Indeed, all of the 


confidence intervals and tests described earlier also apply to the Type II censored 
sampling case by replacing n by r. 

The disadvantage of censored sampling is the extra cost involved with sam- 
pling the additional n — r items and placing them on test. The principal advan- 
tage is that it may take much less time for the first r failures of n items to occur 
than for all r items in a random sample of size r. Yet the efficiency and precision 
involved in the two cases are exactly the same. The relative expected experiment 
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time in the two cases may be expressed as 


(16.3.10) 


A few values of REET are given below for r = 10 to illustrate the substantial 
savings in time that may be realized: 


n: 11 12) 13-15. 20°30 
REET: 0.69. 0.55. 046 0.35. 0.23 0.14 


A reasonable approach is to choose r and n to minimize the expected value of 
some cost function involving the cost of the units and the cost related to the 
length of the experiment. 


Consider again the data in Example 16.3.1. As mentioned earlier, in many life- 
testing cases, the data will occur naturally ordered. This was perhaps not the case 
in the air conditioner data, but let us consider an analysis based on the 20 small- 
est ordered observations for illustration purposes. These are 


1, 3, 5, 7, 11, 11, 11, 12, 14,:14,-44, 16, 16,°20,-21, 23, 42, 
47, 52, 62 


The MLE and UMYUE of @ is 8 = 51.1 based on these 20 observations. The 
MLE for reliability at time t = 20 is now R(20) = 0.676. A lower 95% confidence 
limit for 0 is 


2rb 40(51.1) 
7] SDD SS eee i 
Bory Or): 55,16 pees 


A 95% lower confidence limit for reliability at t = 20 hours is 
Rj. = eo. _ e~ 79/36.66 = 0.58 

The lower (0.95, 0.90) tolerance limit becomes 
L(X; 0.95, 0.90) = @,(—In 0.90) = 36.66(0.105) = 3.85 


There are, of course, some differences between these numbers and the complete 
sample numbers because of random variation, but the censored values have the 
same accuracy as if they had been based on a complete sample of size 20. 
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TYPE | CENSORED SAMPLING 


Statistical analyses based on Type I censored sampling generally are more com- 
plicated than for the Type I case. In this case the length of the experiment, say t, 
is fixed but the number of values observed in time t is a random variable. The 
number of items failing before time t follows a binomial distribution 


R ~ BIN(, p) 


where p = F y(t) = 1 — e~"?, when sampling from an exponential distribution. 
Now the distribution of an exponential variable X given that X <¢ is a trun- 
cated exponential distribution 


Fs) = PIX <x1X <q] = DoS x<t 
1—e77/? x<t 
Sale (16.3.1) 
1 x>t 
Also 
Sf yy oO ee O<x<t (16.3.12) 


F(t) 01 — e7**) 


Thus, given R =r, the.conditional density.of the first r failure times is equiva- 
lent to the joint density of an ordered random sample of size r from a truncated 
exponential distribution, 


Gear eal Ran SF! tT IkXizn) 
me - SAM: a] 
. nil | F(t) ie 


(16.3.13) 


a(t “ies e 18) 


The joint density of obtaining R = r ordered observations at the values x,,,,..., 
X,:, before time t, may be expressed as 


HX12n> sees Xpin) = IX1in> see Xin | P)D(r; n, P) 


~(n ae exp {—[)) Xin + (n — r)t]/9} (16.3.14) 


If:0-< x43 <0 <0, < teforer'=1, 2p.:5n, and :P[R =-0] = (1 — p)? =e"? 
otherwise. 
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Note that this joint density has exactly the same form as in the Type II case 
with x,., replaced by tf. In this case, the MLE of 0 is 


r 
by Xin 1 (n _ ryt 
i=k 


r 


6 = (16.3.15) 
It is interesting to note that in both cases @ is of the form T/r, where T represents 
the total surviving time of the n units on test until the termination of the experi- 
ment. 


In this case 6 and r or Y. Xin and r are joint sufficient statistics for 6. Gener- 
i=1 

ally it is not clear how to develop optimal statistical procedures for a single 
parameter. based.on two. sufficient..statistics. Reasonably..good inference pro- 
cedures for @ can be based on R alone, even though this makes use of only the 
number of failures and not their values. In this case R ~ BIN(n, p), where p is a 
monotonic function of 6, so the usual binomial procedures can be adapted to 
apply to 6. Additional results are given by Bain and Engelhardt (1991). 


TYPE 1 CENSORED SAMPLING (WITH REPLACEMENT) 


As we have seen, the manner in which the sampling is carried out affects the 
probability structure and statistical analysis. It may-be of interest to consider the 
effect of sampling with replacement in these cases. 

Suppose that test equipment is available for testing n units simultaneously. It 
may make more economical use of the test equipment to replace failed items 
immediately after failure and continue testing. As the experiment continues, the 
' failure times are measured from the start of the experiment, whether the failed 
item was an original or a replacement item. These failure times will be naturally 
ordered, but they are not the usual order statistics, so we still denote these suc- 
cessive failure times by s;, i = 1, 2, 3,.... Note that the number of observations 
may exeed n in this case. 

A physical example of how such data may occur is as follows. Consider a 
system of n identical components in series, in which a component is replaced each 
time it fails. Each time a component fails, the system fails, so the system failure 
times would correspond to the type of observations described above. The special 
properties of the exponential distribution make the mathematics tractable for this 
type of sampling. In particular, this situation can be related to the Poisson 
process for the Type I censoring case. (See Section 16.5 for details.) 

Suppose that the successive failure times from n positions are recorded until a 
fixed time ¢. If the times between failures are independent exponential variables 
with mean 6, then the failure times for each position represent the occurrences of 
a Poisson process with 4 = 1/0. Now it follows from the properties of Poisson 
processes that if n independent Poisson processes with intensity parameters 1/0 
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are occurring simultaneously, then the combined occurrences may be considered 
to come from a single Poisson process with intensity parameter v = n/@. Thus, for 
example, the number of failures in time t from the total experiment follows a 
Poisson distribution with parameter vt = nt/@, R ~ POI (nt/6), so 


—ant/6 r 
Far) = oe r=0,1,... (16.3.16) 


If we let 7, =S, and T;= 8S; —S,_,, i= 2, 3, ... denote the interarrival times 
of occurrences from the n superimposed Poisson processes, then the joint density 
of T,,..., T; is 


S(t, ..., t,) =v" exp |-» y | 


and transforming gives the joint density of the first r successive failures, 
r 
S (S15 --:5 S,) = Vo exp {> Y (s; —5;-1) + s.]} 
i=2 
=veo™ (16.3.17) 

where vy =n/0. Now for Type I censoring at t, the likelihood function on 
0<s, <:::<.5,<tforr = 1, 2, 3,...is given by 

f(8i5 5 SN HS (r] sy, --, SIS (Si +5 S,) 


= P[T41.>t—5)fG1,.-.5 5,) 


he e7 MtT Syl a = Mr 


= ye” 
ss n em 
é 
O<s,<:''<s,<t r= 15.2, ..; (16.3.18) 


Also 
P[R =0]=P[S,>t]=e°" r=0 


It is interesting to note at this point that given R =r, the successive failure 
times are distributed as ordered uniform variables. 


Let events occur according to a. Poisson process with intensity v, and let 
S,,S,,... denote the successive times of occurrence. Given that r events occurred 
in the interval (0, 2), then conditionally S,,...,.S, are distributed as ordered 
observations from the uniform distribution on (0, 2). 
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Proof 
We have 
S(s:; caer | S,) 
Sips.) 5 (1) = 
f( 1 | ) f(r) 
ve vt 


ee “vt)'/r! 
r! 


mer O0<s,<:::<s,<t (16.3.19) 


This is the joint density of the order statistics of a sample of size r from the 
uniform density f(x) = 1/t;0 < x < t, and zero otherwise. E 


Returning to the likelihood function, the MLE of 6 is seen easily to be 
t 
§= = ifr>0 (16.3.20) 


in this case. The MLE is again in the form T/r where T = nt represents the total 
test time accrued by the items in the experiment before.termination. 

It is interesting that with replacement the statistic R now becomes a single 
sufficient statistic. Furthermore, R ~ POlI(nt/@), so that previously developed 
techniques for the Poisson distribution can be readily applied here to obtain tests 
or confidence intervals on @. For example, a lower 1 — « level confidence limit for 
8 based on r is 


6, = 2nt/y? _,(2r + 2) (16.3.21) 
and an upper 1 — « level confidence limit for @ is 
Oy = 2nt/y2(2r) (16.3.22) 


These again may be slightly conservative because of the discreteness of the 
Poisson distribution. 


Example 16.3.3 Consider a chain with 20 links, with the failure time of each link distributed as 
EXP(6). The chain is placed in service; each time a link breaks, it is replaced by 
a new one and the failure time is recorded. The experiment is conducted for 100 
hours, and the following 25 successive failure times are recorded: 


5.2, 13.6, 14.5, 14.6, 20.5, 38.4, 42.0, 44.5, 46.7, 48.5, 50.3, 56.4, 61.7, 
62.9, 64.1, 67.1, 71.6, 79.2, 82.6, 83.1, 85.5, 90.8, 92.7, 95.5, 95.6 
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The MLE of @ is 6 = 20(100)/25 = 80. Suppose that we wish to test Hy: 
> 100 against H,: 6 < 100 at the « = 0.05 significance level. We have 


ai 2 AAO) 40 yt f80) a 948 


8 100 
so Hy cannot be rejected at the 0.05 level. A lower 95% confidence limit for 0 is 
2(20)(100) 
6, => = 57.3 
* ¥6.93(52) 


Note that this set of data actually was generated from an exponential distribu- 
tion with 6 = 100. 


TYPE I] CENSORED SAMPLING (WITH REPLACEMENT) 


Again suppose that n units are being tested with replacement, but that the experi- 
ment continues until r failures have occurred. As before, the ordered successive 
failure times, s,,...,5,, are measured from the beginning of the experiment 
without regard to whether they were original or replacement units when they fail. 
if the: failure time. of-an individual unit follows the exponential distribution, 
EXP(6), then. the superimposed failures from the n positions may be considered to 
be. the-occurrences ofa Poisson process with intensity parameter v = n/6, as dis- 
cussed earlier for the time-truncated case. Thus, the interarrival times y, = s,, 
Yj = Sp Sj-4,1 = 2,...,7, are independent exponential variables with mean 6/n, 


Lb 1 U) = (2) exp | -» y vio 


Transforming back to the s,, }’ y; =s,, with Jacobian J = 1, and 
i=1 : 


n r 
S(S1, 8)= (3) e7ns/8 O<s,<::'<5,< 0 (16.3.23) 


The MLE in this case is 


pave (16.3.24) 
r . 
where ns, is again the accrued survival time of all units involved in the test until 
the experiment ends. The likelihood function is.a member. of the raultivariate 
exponential class, and s,is a complete, sufficient statistic for 6. 
As noted above, in terms of the independent exponential variables Y,, 


S20 ¥ 
i=1 
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so 


~ x7(2r) (18.3.25) 


This is again a somewhat astonishing result, because exactly the same result is 
obtained for the Type II censored sampling without replacement, as well as for 
the complete sample case with n = r. It follows that all of the statistical analyses 
available for the complete sample case may be applied directly to the Type II 
censored sampling case with replacement by using the above 6 and replacing n by 
rin the previous formulas. 

It is clear that identical results may be achieved by placing r units on test and 
conducting the experiment with replacement until r units fail, or beginning the 
experiment with n units on test and conducting the experiment with replacement 
until r units fail. The expected experiment time in the latter case is 

= ré 


E(S,,,) = > BU) = goa (16.3.26) 


so the relative expected experiment time of the latter case to the first case is 
r 
REET ==“ = ~ (16.3.27) 


Thus substantial savings in time may be achieved by beginning with additional 
units. The value in saving time must be weighed against the cost of testing addi- 
tional units to decide on the appropriate censoring fraction to use. 

The relative experiment time gained by using replacement also can be deter- 


_ Mined by comparing the expected time required to obtain n failures from n units 


Example 16.3.4 


with replacement, E( Sins with the time required to obtain n failures without 
replacement, E(X,,.,). 


Consider again the data given in Example 16.3.3, but suppose that Type II cen- 
sored sampling with r = 20 was used. Based on the first 20 observations, the 
MLE of @ is 


To test Hy: 62 100 against H,: 6 <.100 in this case, we consider 
2r6 _ 2(20)(83.1) 
6, = 100 
so again H, cannot be rejected at the 0.05 level. Similarly, tests for reliability, 
tolerance limits, and so on can be carried out for this case. 


= 33.2 > 2 95(40) = 26.5 
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WEIBULL DISTRIBUTION 


We have seen that the exponential distribution is an important life-testing model 
that is very simple to analyze statistically. However, it is somewhat restrictive in 
that it is applicable only where a constant HF is reasonably appropriate. The 
Weibull distribution represents a generalization of the exponential distribution 
that is considerably more flexible, because it allows for either an increasing or a 
decreasing HF. The Weibull distribution has been shown empirically to provide a 
good model for a great many types of variables. Also recall that the Weibull 
distribution is:one of the three limiting extreme-value distributions. This may 
provide some theoretical justification for its use in certain cases. For example, the 
strength of a long chain (or the failure time of a system in series) is equal to that 
of the weakest link or component, and the limiting distribution of the minimum 
is a Weibull distribution in many cases. Similarly, the breaking strength of a 
ceramic would be that of its weakest flaw. 

Also recall that if X ~ WEI(@, f), then Y =In X ~ EV(i/f, In 6). Thus, any 
statistical results developed for the Weibull distribution also can be applied easily 
to the Type I extreme-value model and vice versa. Indeed, in the Type I extreme- 
value notation, the parameters are location-scale parameters, so it often is more 
convenient to develop techniques in the extreme-value notation first. If 
Y.~ EV(6, 6), then 


Fy(y) = 1 -— exp | exp al —ao<y<o 


—0o <&<0 6>0 


and X = e’ ~ WEI(6, f), where € = In 6 and 6 = 1/f. For example, if x,,..., x, 
represents a sample of size n from a Weibull distribution, then letting y; = In x;, 
i=1,...,n, produces a sample from an extreme-value distribution. A test for ¢ 
may be developed based on the y,, and then this test could be restated in terms of 


6 = exp (¢). 
MAXIMUM LIKELIHOOD ESTIMATION 


Let X ~ WEI(@, B). Then the likelihood function for the first r ordered observa- 
tions from a random sample of size n under Type II censored sampling is given 
by 


i r 
IX1in 90089 Maa) ao Gea by fused |C a Fy(Xpn) 1" 


i= 


ai ni B aaa g Ming B~-1 
“ieaile) HG) 
"(Xin \? Xeon \? 
x exp \-| 5 (2) +n) } (16.4.1) 
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Setting the partial derivatives with respect to 6 and f# equal to zero gives the 
MLEs @ and f as solutions to the equations 


r 
oF Mi In Xin + (n ~ 1)x?., In Xrin 
ist 


LA 
Te pee Wa ag Ep Y, In Xin (16.4.2) 
> xb, + (n ae r)xé, a 
i=1 


and 
] Se ea 
» Xin aa (n 1)Xfin 1/B 
é= oe eas (16.4.3) 


For the special case of complete samples, where 'n = r, the equations reduce to 


ni 
yx? In x; 


resi 1 ie 

ry ¥ In x; (16.4.4) 
xf i=1 
2» 
and 
[She 
p= | (16.4.5) 
n 


In either. case, the first-equation cannot be solved in closed form. However, it 
has been shown that the MLEs are unique solutions of these equations. The 
Newton-Raphson procedure for solving an equation g(B) = 0 is to determine suc- 
cessive approximations B j> where B j+1 = B ia GB PCAC '). Many other techniques 
also are available with a computer. 

Note that the MLEs for € and 6 in the extreme-value notation are simply 


f=In6 6=1/f (16.4.6) 


It may initially seem unclear how to develop inference procedures based on the 
MLEs in this case. If the estimators cannot be expressed explicitly, then how can 
their distributions be determined? Two key factors are involved in determining 
distributional results in this case. These are the recognition of pivotal quantities 
and the ability to determine their distributions by Monte Carlo simulation. 

It follows from Theorem 11.3.2 for the extreme-value model with location-scale 


parameters that 


(16.4.7) 


oy» 
poh) 
= 
Q 
Gi oO» 
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are pivotal quantities with distributions that do not depend on any unknown 
parameters. Thus, in the Weibull notation, it follows that 


(3) (Fy and : (16.4.8) 


are also pivotal quantities. For the Weibull case, the reliability at time ¢ is given 
by 
R= R(t) =e! 


A pivotal quantity for R is not available, but the distribution of R depends only 
on R and not on ¢, 0, and f individually. This result is true in general for 
location-scale models (or related distributions such as the Weibull), but it is 
shown directly in this case, because 


—InR= (3) 
-(@ /e wry" 
(6/8) 


—In RF 
= | sar | (16.4.9) 


which is a function only of R and the previous pivotal quantities. This result 
makes it possible to test hypotheses about R or set confidence intervals on R, if 
the distribution of R can be obtained for various R values. Recognition of these 
pivotal quantity properties makes it quite feasible to determine percentiles for the 
necessary distributions by Monte Carlo simulation. 

For example, we may desire to know the percentile, q,, such that 


PEB/B <4] =) 
for some sample size n. Let us generate, say, 1000 random samples of size n from 
a standard Weibull distribution, WEI(1, 1), and compute the MLE of £, say ieee 
In particular, we could determine the number, g,, for which 100y% of the calcu- 
lated values of 8,, were smaller than g,. Approximately, then, 


P{Bi, < 4,1=7 


This approximation can be improved by increasing the number of simulated 
samples within the limits of the random number generator. Now, because the 
distribution of £/8 does not depend on the values of the unknown parameters, 
the distribution of 8/B is the same as the distribution of B,,/1; thus, approx- 
imately, 


PUB/B <4] =y 


For example, within simulation error, B/a, is a lower 100y% confidence limit 


for B. 
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Tables of percentage points for the quantities JnB /B — 1) and /np In (6/6), 
and other tables for determining tolerance limits and confidence limits on relia- 
bility, are provided by Bain and Engelhardt (1991) for both complete and cen- 
sored sampling cases. 


ASYMPTOTIC RESULTS 


Conyergence is somewhat slow in the Weibull case, but for reasonably large n the 
asymptotic normality properties of the MLEs become useful. 
Asn-— oo and r/n- p, the following properties hold asymptotically: 


/r(6 — BiB ~ N(O, a2) (16.4.10) 
a/v In (6/8) ~ NO, @41) (16.4.1) 
JRO) — R(t)] ~ N(O, Vp) (16.4.12) 


where a,, = rf? Var(6/0), a). =r Var(f/p), and a,, =r Cov(6/6, B/f) are the 
asymptotic variances and covariances, and 
V, = R?{a,,(In R)* — 2a,, In (—In R) + a,,[in (—In R)]*} (16.4.13) 


The a;; are included in Table i5 (Appendix C) for censoring levels p = 0.1, 
0.2,...., 1.0. See also Harter (1969) for c,; = (n/r)a,;. 
Similar results hold for the extreme-value case where, asymptotically, 


Jr — 6/5 ~ NO, a2) (16.4.14) 
JE — 8/5 ~ NO, a1) (16.4.15) 


with a,, =r Var(é/d), a.2 =r Var(6/6), and r Cov(é/6, 6/5) = —ay. 
For the Weibull distribution, it appears that convergence to normality occurs 
faster for an alternate pivotal quantity of the form 
~ 5 
pay 


WO 


16.4.16 
5 5 Lean) 


Confidence limits on €, R(t), or percentiles, based on W(d), will be equivalent to 
those based directly on the earlier pivotal quantities. Johns and Lieberman (1966) 
consider confidence limits using W(d) based on simpler estimators, and Jones et 
al. (1984) develop limits essentially based on W(d) [see also Bain and Engelhardt 
(1986)]. Let w,(d) be the y percentage point such that 


PLW(d@) <w,O] = 
then the asymptotic normal approximation for w,(d) is 
w,(d) = —d+o,z, (16.4.17) 


where ro? = a,, + d?a,, — 2da,, = A(d). 
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We now consider first approximate confidence limits on reliability. Note that 
in the extreme-value notation, the reliability at time y is related to the Weibull 
reliability as 


Ry) = PLY > y] = Plln X > y] = PLY >] = R,(e) (1.4.18) 
and 
Ry{x) = Ry(In x) (16.4.19) 


Now for a fixed time x, let y = In x, and a lower y confidence limit for Ry(x) is 
given by 


L(R,{x)) = exp [—exp (-4 + z,¢,)] (18.4.20) 
where d = —In [—In Ry(x)] = (¢ — y)/4 and a, = 0,(4) = VA(@)/r. This follows 


‘because 


P[L(Rx(x)) < Ry(x)] = P{exp [—exp (—d + z,o,)] < exp [—(x/0)*]} 
=P{—-d+ 2,0, 2 B In (x/6))} 


= PLW(d) < wd)] 

= E;P([W(d) < w,(d)|d) 
= Ex) 

=) 


Approximate confidence limits for percentiles can be similarly determined. The 
100« percentile for the Weibull distribution is given by 


Xq_ = OL —In (1 — a)]'/" = exp (y,) (16.4.21) 
where 
2=F&+dln[—-In(1—«)] =€+ 6A, (16,4,22) 


is the « percentile for the extreme-value distribution. For the special case of 
a = 1 — 1/e, A, = 0 and these percentiles reduce to y, = € and x, = 0. Note that 
a lower y level confidence limit for x,, say L{x, ; y), is also a y-probability toler- 
ance limit for proportion 1 —« for the Weibull distribution. For the extreme- 
value distribution 


L(ye3 Y) = In L(x, ; y) (16.4.23) 


16.4 WEIBULL DISTRIBUTION 565 


In terms of the pivotal quantity 


a 


Qe tia tt ; a (16.4.24) 
La) = 3a GS (16.4.25) 


where P[Q < q,] = y. The Monte Carlo value g, may be determined from Bain 
and Engelhardt (1991). In terms of W(d), 


6 
= P[W(d) < A,] 
=) 
which gives w,(d) = 4,, where d = q, — A,. That is, g,=d+ A, where d is the 


solution to w,(d) = A,. Using the asymptotic normal approximation for w,(d), we 
may solve d to obtain 


2 2 2! i 2741/2 
4,925 — 4A, +.2,[(07, — 1 1455)27 +7a,, + 2ra,, A, +7554 
d “ 12 4y a Ll 12 Ai 22) ? i1 12“ } 22 <] (16.4.26) 


t= 227 


- 3 
PQ <4,] = [sas = (a, 4) 5S | 


Then 
Lye 7) =3,—-( +4)b = € - db (16.4.27) 
and 
L(x, ; ¥) = exp [L(y, 5 9)] (16.4.28) 
=O e748 


Lower confidence limits for € and 6 are obtained by letting 4, = 0 in computing d. 


INFERENCES ON 5 OR 6 


A chi-square approximation often is useful for positive variables. An approximate 
distribution for a variable U with two correct moments is achieved by consider- 
ing 

cU ~ y(v) (16.4.29) 
where c and y are chosen to satisfy cE(U) =v and c? Var(U) = 2v. Following 
along these lines, Bain and Engelhardt (1986) propose the simple approximate 
distributional result given by 

er(8/6)'*”? ~ 47(c(r — 1) (16.4.30) 


where p = r/n, c = 2/[(1 + p*)?a.], and a3, is the asymptotic variance of \/7/5 
as noo and r/n-— p. Values of a,, are given in Table 15 (Appendix C), and 
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values of c also are included for convenience. The constant c makes this approx- 
imation become correct asymptotically. 

It is clear that inferences on f or 6 can be carried out easily based on the above 
approximation. For example, a lower y level confidence limit for 6 is given by 


6, = dLx2(c(r — 1)fer)]~ A+") (16.4.31) 
An upper limit is obtained by replacing y with 1 — y. 


The 20 smallest ordered observations from a simulated random sample of size 40 
from a Weibull distribution with @ = 100 and f = 2 are given by Harter (1969) 
as follows: 


5, 10, 17, 32, 32, 33, 34, 36, 54, 55, 55, 58, 58, 61, 64, 65, 65, 66, 
67, 68 


It is possible to observe how the statistical.results relate.to.the known model for 
this data. Also, Monte Carlo tables happen to be available for this particular 
sample size and censoring level, so the approximate results can be compared to 
the results that would be obtained from using the Monte Carlo tables in this case. 

For this set of data, r = 20, n= 40, p= 0.5, 8 = 2.09, and 4 = 83.8. In the 
extreme-value notation, = In 6 = 4.43 and 6 = 1/f = 0.478. These also are the 
values that would be obtained if one directly computed the maximum likelihood 
estimators of € and 6 in an extreme-value distribution based on the natural 
logarithms of the above data, y; = In x;. 

Now, an upper y level confidence limit 8, such that P[B < By] = 7 is given by 


Bu = Bhy(y, p,) = BO — Dery t” (16.4.32) 
Similarly, a lower level confidence limit is given by 
Bi =BhAA — », p,) = BOG (er — Dferyar™ (16.4.33) 


For y = 0.95, based on the above data, c = 1.49 from Table 15 (Appendix C) and 
t= 2.09[%6,05(28.3)/29.8]1/1:75 = 1.34 
and 
By = 2.09[%6.95(28.3)/29.8] 1/125 = 2.73 


If the tables in Bain and Engelhardt (1991) based on Monte Carlo simulation are 
used, one obtains 8, = 1.34 and fy = 2.72 in this case. 

Note that in the extreme-value notation, 6, = 1/By and dy = 1/B,. 

We now will illustrate the lower confidence limit for reliability at time t, Ry(b). 
The true reliability at time ¢ = 32.46 is 0.90 for a Weibull distribution with 
6 = 100 and f# = 2. Thus, let us compute a lower confidence limit for Ry(32.46) 
based on the above data. The lower confidence limit for R,(t) is given by 


L(R,(t); y) = exp [—exp (-d +, z,)] (16.4.34) 
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where d = —In [—In R,(¢)], A = Qi 4+ tas, = 2da,,, and the a;; are given in 
Table 15. We have 


R,(32.46) = exp [—(32.46/83.8)?°] = 0.871 
d= —In [—In 0.871)] = 1.98 
raz, = A = 1.25512 + (1.98)*(0.85809) — 2(1.98)(0.46788) = 2.766 
and for, say, y = 0.90, we have z, = 1.282 and 


L(Rx(32.46); 0.90) = exp [—exp (—1.98 + ./2.766(1.282)/,/20)] = 0.801 


Again, direct use of the Monte Carlo tables gives nearly the same result, 0.797. 
Considering the reliability at time t = 32.46 in the Weibull distribution is com- 
parable to considering the reliability at yo = In (32.46) = 3.48 in the analogous 
extreme-value model. Thus, for example, a 90% lower confidence limit for 
Ry(3.48) is also 
L(Ry{3.48); 0.90) = L(R,(e?-*8); 0.90) = 0.801 
We now will illustrate a tolerance limit or confidence limit on a percentile. The 
1000 percentile, x,, for the Weibull distribution and y, for the extreme-value 
distribution are given in equations (6.4.21) and (16.4.22). If, for example, 
a = 0.10, then 
%, = 6[—In (1 — 0.10)]'4 = 28.6 
and 
9, = In (&,) = 3.35 


A lower y level tolerance limit for proportion 1 — « for the extreme-value dis- 
tribution is, from equation (16.4.27), 


La 3?) = Iq —(d + Ag)d = & — dd (16.4.35) 
and for the Weibull distribution 
L(x, 3 7) = exp (L(y, ; y)] (16.4.36) 


where A, = In [—In (1 — @)] and d is given by equation (16.4.26). 
We have 49 19 = In [—In (1 — 0.10)] = —2.25, and if we choose y = 0.90 in 
our example, then Zp 99 = 1.282, and 


d = {—0.46788(1.282)? — 20(—2.25) + 1.282[((0.46788)? 
—(1.25512)(0.85809))(1.282)? + 20(1.25512) 
+ 2(20)(0.46788)(—2.25) 
+ 20(0.85809)( —2.25)?]*/?3/[20 — 0.85809(1.282)?] 
= 2.95 
L(¥0.10 ; 0.90) = 3.35 — (2.90 — 2.25)(0.478) = 3.02 
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and 
L(Xo.19 ; 0.90) = exp (3.04) = 20.4 


Again, direct use of the Monte Carlo tables gives almost the same result, 20.3. 
A lower y level confidence limit for the parameter € or 0 is given by 


L(g; ) = €— db (16.4.37) 
or 
LO; y) = 6 e-# (16.4.38) 


wherea = 1—1/e and A, = 0. 
Note that upper y.level confidence limits are given by replacing y with 1 — y: 
U(6; y) = L@; 1—y) (16.4.39) 


Let us find a two-sided 90% confidence limit for 6. We must compute d with 
Zo.95 = 1.645 and Zo95 = —1.645. This simply changes the sign of the term in 
brackets; thus the two values of d are given by 


—0.46788(1.645)? + 1.645[((0.46788)? 
d= —(1.25512)(0.85809))(1.645)? + 20(1.25512)]*/? 
20 — (0.85809)(1.645)? 


= 0.372 or -—0.515 
This gives 
L(6, 0.95) = 83.8e7 °:372/2-99 = 70.1 
and 
U(0, 0.95) = 83.87 9-345/2.09 = 107.2 


Thus a two-sided 90% confidence interval for @ is (70.1, 107.2), and a two-sided 
90% confidence interval for € is 


(In 70.1, In 107.2) = (4.25, 4.67) 


The corresponding interval from the Monte Carlo tables is found to be 
(4.27, 4.71). 


SIMPLE ESTIMATORS 


Computation of the MLEs is relatively simple if a computer is available; 
however, it sometimes is more convenient to have simpler closed-form estimators 
available. A small set of sufficient statistics does not exist for the Weibull dis- 
tribution, so it is not completely clear how to proceed with this model. We know 
that the MLEs are asymptotically efficient, and they are good estimators for 
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small n except for their difficulty of computation. The MLEs also are somewhat 
biased for small n and heavy censoring. 

Simpler unbiased estimators have been developed for the location-scale param- 
eters of the extreme-value. distribution (see Engelhardt and Bain, 1977b), and 
these can be applied to the Weibull parameters. These estimators are very similar 
to the MLEs, and for the most part can be used interchangeably with them. If an 
adjustment is made for the bias of the MLEs, then these two methods are essen- 
tially equivalent, particularly for the censored sampling case. The simple estima- 
tors still require some tabulated constants for their computation. 

Let X1.n> +++» Xp, denote the r smallest observations from a sample of size n 
from a Weibull distribution, and let y, = In x,,,, denote the corresponding ordered 
extreme-value observations. The simple estimators then are computed as follows: 


1. Complete sample case, r =n: 
b= up=| - by y; + 


é 


> »| | nk, (16.4.40) 


NS jas44 


i 


Ind=y+y6 (16.4.41) 


where's = [0.84n] = largest integer < 0.84n, ).is the mean, and y = 0.5772. 
Some values of k, are provided in Table 16 (Appendix C). 
2. Censored samples, r'<-n: 


= 1/p = G —ADy = y | | nk,» (16.4.42) 


y ~ ™ 


Eé=Iné=y,—c,,6 (16.4,43) 
Quadratic approximations for computing k,.,, and.c,., are given by 
k 


rs 


Crn = E(¥, — 0/5 = co + c1/n + €2/n? (16.4.45) 


nko tky/n + k2/n? (16.4.44) 


where the coefficients are tabulated in Table 17 (Appendix C). These 
constants make 0 and ¢ unbiased estimators of 6 and €. The values kg 
and Cg are the asymptotic values as n> co and r/n- p. 


If one wishes to substitute simple estimators for the MLEs, then slightly 
improved results are obtained by using the following modified simple estimators: 


5* = 6/[1 + Var(d/6)] = hé/(h + 2) (16.4.46) 
é* = & — Cov(E/6, 5/6)d* (16.4.47) 
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where 
Cov(é/d, 5/5) = d,, —¢,, , 2n/h 
d, , =n. Cov(¥,/5, 6/5) = dg + d,/n + dy/n? 
h/n = 2/n Var(5/6) = ay + a,/n + a2/n? 


and the coefficients are included in Table 17 (Appendix C). 
Similarly, approximately debiased MLEs are given. by 


j _(h-2B gh +296 
ee (A+) eee 


€,=€ +6 Cov(2/6, 5/6) (18.4.4) 


Again, we have the approximations 6, ~ 6, &, = & 6 = 6%, and é ~ &*. 


REPAIRABLE SYSTEMS 


Much of the theory of reliability deals with nonrepairable systems or devices, and 
it emphasizes the study of lifetime:models. It-is important to distinguish between 
models for repairable and nonrepairable systems. A nonrepairable system can fail 
only once, and a lifetime model such as the Weibull distribution provides the 
distribution of the time at which such a system fails. This was the situation in the 
earlier sections of this chapter. On the other hand, a repairable system can be 
repaired and placed back in service. Thus, a model for repairable systems must 
allow for a whole sequence of repeated failures. 

One such model is the homogeneous Poisson process or HPP which was 
introduced in Chapter 3. In this section we will consider additional properties of 
the HPP and discuss some more general processes that are capable of reflecting 
changes in the reliability of the system as it ages. 


HOMOGENEOUS POISSON PROCESS 


We denote. by X(t) the number of occurrences (failures) in the time interval [0, ¢]. 
It. was. found in Theorem 3.2.4 that.under the following conditions X(t) is an 
HPP: . 


i. X(0) = 0. 

2. P[X(t + h) — X(t) = n| X(s) =m] = PLX(t +h) — X(t) = n] for all 
0<s<tand 0<A. 

3. P[X(t + At) — X(t) = 1] = AAt + o(Ad) for some constant A > 0. 

4. P[X(t + At) — X(t) > 2] = ofA). 
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In other words, if conditions 1 through 4 are satisfied, then 
P[X(t) =n] = e7*(At)"/n! 


for alln = 0, 1,..., and some 4 > 0. 

Thus, X(t) ~ POI(At), where uw = E[X(t)] = At. The proportionality constant 2 
reflects the rate of occurrence or intensity of the Poisson process. Because A is 
assumed constant over t, and the increments are independent, it turns out that 
one does not need to be concerned about the location of the interval under ques- 
tion, and the model X ~ POI(u) is applicable for any interval of length t, 
[s, s+ ¢], with w = At. The constant 4 is the rate of occurrence per unit length, 
and the interval is ¢ units long. This also is consistent with Theorem 3.2.4. In 
particular, the interval [0, ¢] can be represented as a union of n disjoint subinter- 


n 
vals, each of length t;. If Y= }’ X;, where X;, is the number of occurrences in 
i=1 
the ith subinterval, then ny = )° uw; = 2), t;, and Y represents a Poisson variable 
with intensity rate 4 relative to the interval of length }’ t;. That is, one can 
choose any interval, but the variable remains Poisson with.the appropriate mean. 


Let X denote the number of alpha particles emitted from a bar of polonium in 
one.second,-and assume that the rate.of emission is 2 =.0.5 per second. Thus 
Bx = 0.5(1) = 0.5, and the Poisson model for.this variable would be 


fixe OSPF /x! «x= 0, 1... 
For example, 
P[X = 1] =f,(1) = 03 
Let Y denote the number of emissions in an eight-second interval. One may 
consider Y = > X; with py = ¥ (0.5) = 4, or one may consider the mean of Y 
= i=1 


as At = 0.5(8) = 4. In any case 


Srly) =e *(Aty/y! 
=e *4/y!. y=0,1,... 


In practice one may wish to estimate the value of uy from data. A frequency 
histogram also would be useful to help evaluate whether the Poisson model pro- 
vides an appropriate distribution of probability. Rutherford and Geiger (1910) 
observed the number of emissions in 2608 intervals of 7.5 seconds each, with the 
results shown in Table 16.1. Note that y denotes the number of emissions in a 
7.5-second interval in this example. 

The table indicates that no emissions were observed in 57 intervals, 1 emission 
was observed in.203 of. the intervals, and so on. If we let ¥Y denote the number of 
emissions in a 7.5-second period, and if we assume a Poisson model Y ~ POI(u), 
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Observed number of alpha particle emissions in 2608 
intervals of 7.5 seconds each 


No. of Particies No. of intervais Estimated 
Emitted, y with y Emissions, m, Expected Nos., 9, 
0 57 54.40 
1 203 210.52 
2 383 407.36 
3 525 525.50 
4 §32 508.42 
5 408 393.52 
6 273 253.82 
7 439 140.32 
8 45 67.88 
9 27 29.19 
10 10 11.30 
11 4 3.97 
12 2 : 1.28 
213 0 0.52 
2608 


then it would be reasonable to estimate ~ with the sample mean, ). In this 
example the data are grouped, so 


2608 


> y= YL ym, = 0(57) + 1(203) + 2(383) + --- = 10094 
i=i y=O 


and ¥ = 3.870. 

Using the fitted model Y ~ POI(3.870), in 2608 observed intervals one would 
expect (2608)P[Y = 0] = (2608)e~ 3-87°(3.870)°/0! = 54.4 intervals with no emis- 
sions, (2608)P[Y = 1] = 210.52 intervals with 1 emission, and so on. These com- 
puted expected numbers are included in Table 16.1. The computed expected 
numbers appear to agree quite closely. with the observed numbers, and the sug- 
gested Poisson model seems appropriate for this problem. More formal statistical 
tests for the goodness-of-fit of a model can be performed using the results of 
Section 13.7. If we combine the cases where y > 11, the chi-square value is 
y? = 12.97 and v= 12—1—1 = 10. Because x2 99(10) = 15.99, we cannot reject 
the Poisson model at the ¢ = 0.10 level. 


EXPONENTIAL WAITING TIMES 


With any Poisson process there is an associated sequence of continuous waiting 
times for successive occurrences. 
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Theorem 16.5.1 If events are occurring according to an HPP with intensity parameter A, then the 

waiting time until the first occurrence, T,, follows an exponential distribution, 
T, ~ EXP(6) with 9 = 1/4, Furthermore, the waiting times between consecutive 
occurrences are independent exponential variables with the same mean time 
between occurrences, 1/A. 


Proof 
The CDF of T, at time t is given by 
F,() = P[T, <t] =1-P{T,>¢] 


Now T, > ¢ if and only if no events occur in the interval [0, ¢], that is, if X(4) = 0. 
Hence, 


F,(@) =1— P[X()=0) =1-P,()=1-e* 

which is an exponential CDF with mean 1/4. 
The proof of the second part is beyond the scope of this book (see Parzen 
1962, p. 135). ie 


We see that the mean time to failure, 0, is inversely related to the failure inten- 
sity J. The HPP assumptions are rather restrictive, but at least a very tractable 
and easily analyzed model is realized. 


Theorem 16.5.2 If J, denotes the waiting time until the kth occurrence in an HPP, then 
T, ~ GAM(1/A, k). 


Proof 
The CDF of T, at.time tis given by 
F().=1-P(T,> ¢] 


= 1 — P[k —1 or fewer occurrences in [0, ¢]] 
k-1 

=i— by P(t) 
i=0 
k-d 

=1-— ¥ (anie"*/i! 
i=0 : 

which is the CDF of a gamma variable with parameters k and i/A. 2] 


This result also is consistent with the second part of Theorem 16.5.1; if we 
assume independent Y, ~ EXP(1/A), then 


k 
T,= Y ¥~ GAM(I/A &) 


i=1 
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w 


It was observed earlier that the exponential distribution has a no-memory pro- 
perty. This is related to the assumption of a constant failure intensity 4, which 
implies that no wearout is occurring. It sometimes is said that the exponential 
distribution is applicable when the failures occur “at random” and are not 
affected by aging. We know that the “at random” terminology is related to the 
uniform distribution, but it is used in this framework for the following reason. If — 
T,, Ty, ... denote the successive times of occurrence of a Poisson process mea- 
sured from time 0, then given that n events have occurred in the interval [0, ¢], 
the successive occurrence times T;, ..., J, are conditionally distributed as ordered 
observations from a uniform distribution on [9, ¢]. 


Proschan (1963) gives the times of successive failures of the air conditioning 
system of each member of a fleet of Boeing 720 jet airplanes. The hours of flying 
time, y;, between 30 failures on plane 7912 are listed below. 


23, 261, 87, 7, 120, 14, 62, 47, 225, 71, 246, 21, 42, 20, 5, 12, 120, 
11, 3, 14, 71, 11, 14, 11, 16, 90, 1, 16, 52, 95 


If we assume that the failures follow an HPP with intensity A, then this set of 
data represents a random sample of size 30 from EXP(1/). Using the sample 
mean to estimate the population mean’ gives 6=7=59.6=i1//, and t= 
1/59.6 = 0.0168. 

Let X denote the number of failures for a 200-hour interval from this process. 
Then X foliows a Poisson distribution with 4 = At = 200A. If we wish to estimate 
A using the Poisson count data, we first consider the successive failure times of 
the observed data, given by 


23, 284, 371, 378, 498, 512, 574, 621, 846, 917, 1163, 1184, 
1226, 1246, 1251, 1263, 1383, 1394, 1397, 1411, 1482, 1493, 1507, 
1518, 1534, 1624, 1625, 1641, 1693, 1788 


Considering the first eight consecutive intervals of length 200 hours, the 
numbers of observed failures per interval are 1, 3, 3, 1, 2, 2, 7, 6. Thus, these eight 
values represent a sample of size eight from POI(2004). Estimating the mean of 
the Poisson variable from these count data gives f = X = 3.125 = 200A, and 4 = 
0.0156. The two estimates obtained for 4 are quite consistent, although, of course, 
they are not identical. 


NONHOMOGENEOUS POISSON PROCESS 


The Poisson process is an important model for the failure times of a repairable 
system. In this terminology, the HPP assumptions imply that the time to first 
failure is a random variable that follows the exponential distribution, and also 
that the time between failures is an independent exponential variable. The 
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assumption of a constant failure intensity parameter A suggests that the system is 
being maintained and not wearing out or degrading. If the system is wearing out, 
then the model should be generalized to allow A to be an increasing function of t. 
More generally, we might want to allow the intensity to be an arbitrary non- 
negative function of t. 

We can model this if, in part 3 of Theorem 3.2.4, we replace the constant A with 
a function of £, denoted by A(t). A similar derivation yields another type of 
Poisson process, known as a nonhomogeneous Poisson process (NHPP). 

If X(t) denotes the number of occurrences in a specified interval [0, ¢] for a 
NHPP, then it can be shown that 

X(t) ~ POI(u(t)) 


where 


t 
M(t) = | A(s) ds 
0 
The CDF for the time to first occurrence, T;, now becomes 


F,() = 1—exp [—-x()] 

An important choice for a nonhomogeneous intensity function is 
At) = (B/OKt/8)"~* 

which gives 
Ht) = (t/0)" 


In this case the time to first occurrence follows a Weibull distribution, WEI(9, f). 
This intensity parameter is an increasing function of t if B > 1 and a decreasing 
~ function of t if 8 < 1. The 8 < 1 case might apply to a developmental situation, in 
which the system is being improved over time. Note that the times between con- 
secutive failures are not independent Weibull variables in this case. 


COMPOUND POISSON PROCESS 


It was noted earlier that one characteristic of the Poisson distribution is that the 
mean and variance have the same value. In some cases this property may not be 
valid, and a more flexible model is required. One type of generalization is to 
consider mixtures of distributions. For example, if a fraction p, called type 1, of 
the population follows POI(u,), and the fraction 1 — p, called type 2, follows 
PO](u,), then the CDF for the population distribution is given by 


F(x) = P[X < x|type 1]P[type 1] + PLX <x|type 2]P[type 2] 
= F(x; 4y)p + F2(x; H2)(1 — p) 


The pdf would be a similar mixture of the two separate pdf’s. 
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More generally, if the population is a mixture of k types, with fraction p; fol- 
lowing the pdf f{x), then 


k k 
S~)= YE pf) »y p= 
i=1 = 
if the f(x) are all Poisson pdf’s, but with differing means, then 
k k 
f= 2 peux! Lpal (16.5.1) 
i= i=1 


For example, suppose that a fleet of k types of airplanes is considered, with frac- 
tion p,; of type i. These also could be the same type of airplane used under k 
different conditions. Assume that the number of air conditioner failures in a 
specified interval [0, ¢] from an airplane of type i follows POI(,). Now, if an 
airplane is selected at random from this fleet of airplanes, then the number of air 
conditioner failures:in time [0, 4] for that airplane is a random variable that 
follows the mixed Poisson distribution, given by equation (16.5.1). 

This situation is equivalent to assuming that, given y;, the conditional density 
of the number of failures, f,,,,(x), is POI), and that yw is a random variable that 
takes on the value y with probability p,. In this example the y, are fixed values, 
and the effect of drawing an airplane at random is to produce a random variable 
y distributed over these values. 

Now we consider, at least conceptually, a large fleet of airplanes in which, for 
any given airplane, the number of air conditioner failures in [0, t] follows a 
POI(y) with uw = At; however, A may be different from plane to plane. In particu- 
lar, for this conceptually large population, we assume that A is a continuous 
random variable that follows a gamma distribution, GAM(y, x). That is, 


peatenAy 
A) =-——— 0<A<m 
LO = Fae 


and 


fay lx) =e M(t]! x=0,1, 20° 


Note that in the context of a Bayesian analysis of a Poisson model, the density 
f(A) corresponds to a prior density for the parameter, and the mathematics 
involved here is essentially equivalent to that involved in the associated Bayesian 
development. The differences between the two problems depend on the philos- 
ophy for introducing a density function for the parameter, and the interpretation 
of the results. In a Bayesian analysis the parameter may have been considered 
fixed, but the prior density reflects a degree of belief about the value of the 
parameter or some previous information about the value of the parameter. 


16.5 REPAIRABLE SYSTEMS 577 


The marginal density for the number of failures in time [0, t] for an airplane 
selected at random is given in this case by 


fil) = i "f(s a) ad 


= [ * Feyal) fd) ad 


ao e7*(At)* Ak te- aly 
ee T(x)y* 


x+K-I1 (yt)* 
= —_— = 0, 1,... 
Oe eee ees 


where 0 <x < 00 and 0<y< oo. This is a form of negative binomial distribu- 
tion, with p = 1/(1 + yt), and it is referred to as a compound Poisson distribution 
with a gamma-compounding density. 

Thus, the negative binomial distribution represents a generalization of the 
Poisson distribution, and it converges to the Poisson distribution when the 
gamma prior density becomes degenerate at a constant. The negative binomial 
model is used frequently as an alternative to the Poisson model in analyzing 
count data, particularly when the variance and the mean cannot be assumed 
equal. - 

The mean and variance for the negative binomial variable in the above nota- 
tion are given by 

E(X) = xyt = tE(d) 
Var(X) = xypt(yt + 1) = tE(A) + t?Var(A) 

We see that Var(X) > E(X), and the. Poisson. case holds as Var(A) = xy? > 0. 
Of course, other compound Poisson models can be obtained by considering com- 
pounding densities other than the gamma density; however, the gamma density is 
a very flexible two-parameter density, and it is mathematically convenient. The 
unknown parameters in this case are « and y, and techniques developed for the 
negative binomial model may be used to estimate these parameters based on 
observed values of x. 

If one follows through the Bayes’ Rule, then an expression for the conditional 
density of A given x may be obtained: 

fad) = 22 
Fx) 
Fxpaed fA) 
Fx(x) 


Simplification shows that 
A|x ~ GAM(y/(yt + 1), x + &) 
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COMPOUND EXPONENTIAL DISTRIBUTION 


If air conditioner failure for a given airplane occurs according to a Poisson 
process with failure intensity A, then the time to first failure, T, follows an expo- 
nential distribution. If we again assume that the intensity parameter varies from 
airplane to airplane according to a gamma distribution, then the time to first 
failure for an airplane selected at random follows a compound exponential dis- 
tribution. 


frlt) = i ” fr alts A) aA 


is { ” fyalt)f(2) a 


[s pay aa 1,-Aly 
~ Jo T(x)y" 


=Kypyt +1)7"t) O<t< 0 | 
This is a form of the Pareto distribution, T ~ PAR(1/y, x). 


Proschan (1963) gave air conditioner failure data: for several airplanes. Ten of 
these airplanes had at least 1000 flying hours. For these 10 planes the numbers of 
failure,-x, in 1000 hours are recorded below: 


Airplane: 7908 7909 7910 7911 7912 7913 7914 7915 8044 8045 
x: 8 16 9 6 10 13 16 4 9 12 


For this set of data, s* = 15.79 and X = 10.30. These results suggest that the 
mean and ‘variance of X may not be equal and that the compound Poisson 
(negative binomial) model may be preferable to the Poisson model in this case; 
however, additional distributional results are needed to indicate whether this 
magnitude of difference between s? and x reflects a true difference, or whether it 
could ‘result from random variation. It has been shown in the literature for this 
case that approximately 


(n — 1)S? 
xX 


when X does follow a Poisson model. In our problem (n — 1)s?/x = 9(15.79)/10.30 
= 13.80. Now P[x?(9) > 13.8] = 0.13; thus, the observed ratio is larger than 
would be likely. However, there is an approximate 13% chance of getting such a 
result when the Poisson model is valid. 


x(n — 1) 


Greenwood and Yule (1920) studied the number of accidents during a five- 
week. period for 647 women in a shell factory. It turned out that a Poisson 
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process appeared reasonable for each individual, but the intensity varied from 
individual to individual. That is, some workers were more accident-prone than 
others. They found that a gamma distribution provided a good compounding 
distribution for the accident intensities over workers, and that the negative bino- 
mial model provided a good model for the number of accidents of a worker 
selected at random. 


SUMMARY 


Our purpose in this chapter was to introduce some basic concepts of reliability 
and to develop the mathematical: aspects of the statistical analyses of some 
common life-testing models. 

Various characterizations of models in reliability theory can be given. The 
most basic is the reliability function (or survivor function) that corresponds to the 
probability of failure after time t, for each positive t. The hazard function (or 
failure-rate function) provides another way to characterize a reliability model. 
The hazard function gives a means of interpreting the model in terms of aging or 
wearout. If the hazard function is constant, then.the. model is exponential. An 
increasing hazard function is generally interpreted as reflecting aging or wearout 
of the unit under test. 

The gamma and Weibull distributions are:two different models that include the 
exponential model, but also allow, by proper choice of the shape parameter, for 
an. increasing hazard function. These-models also admit the possibility of a 
decreasing hazard function, although this is less common in reliability applica- 
tions. 

Most of the statistical analyses for parametric life-testing models have been 
developed for the exponential and Weibull models. The exponential model is 
generally easier to analyze because of the simplicity of the functional form and 
some special mathematical properties that hold: for the exponential model. 
However, the Weibull model is more flexible, and thus it provides a more realistic 
model in many applications, particularly those involving wearout or aging. 
Although the Weibull distribution is not a location-scale model, it is related by 
means of a log transformation to the extreme-value model. This makes the deri- 
vation of confidence intervals and tests of hypotheses possible, because pivotal 
quantities can be constructed from the MLEs. 


EXERCISES 


Consider a random sample of size 25 from f(x) = (1 + x)~7,0<x<.o. 
(a) Give the likelihood function for the first 10 ordered observations. 
(b) Give the likelihood function for the data censored at time x = 9. 
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(c). What is the probability of getting zero observations by x = 9? 

(d) What is the expected number of observations by time x= 9? 

(ce). What sample size would be needed so that the expected number of observations by 
x. =.9 would be 20? 

(f) Approximately what sample size would be needed to be 90% sure of observing 40 or 
more observations by x = 9? 

(g) Given that R = 20 observations occurred by x = 9, what is the joint conditional 
density of these 20 observations? 


Rework Exercise 1, assuming X ~ EXP(4). 


Suppose that f(x) = 1/0,0.< x <0. Find the MLE of @ based on the first r ordered 
observations from a sample of size n. 


Suppose that X ~ PAR(@, k). 
(a) Determine the hazard function. 
(b) Express the 1 — p percentile, x; _,. 


Can h(x) = e-* be a hazard function? 
Find the pdf associated with the hazard function h(x) = e”. 


A component in a repairable system has a mean time'‘to failure of 100 hours. Five spare 
components are available. 
(a) What is the expected operation time to be obtained? 
(b) If T;~+ EXP(100) for each component and the five spares, what is the probability 
that the system will still-be in operation after 300 hours? 


(c) How many spares are needed to have a system reliability of 0.95 at 300 hours? 


The six identical components considered in Exercise 7(b) are connected as a parallel 
system. 
(a) What is the mean time to failure of this:parallel system? 


(b) How many of these components would be needed in a parallel system to achieve a 
mean time to failure of 300 hours? 


(c). What is the reliability of this parallel system at 200 hours? 


The six components considered in Exercise 7(b) now are connected in series. 
(a) What is the mean time to failure of this series system? 
(b) What is the reliability of the system at 10 hours? 
(c) What mean time to failure would be required for each component for the series 
system to have a reliability of 0.90 at 20 hours? 
(d) Give the hazard function for the series system. 


Rework Exercise 9, assuming that the T; ~ PAR(400, 5),i=1,..., 6. 


Rework Exercise 8(c), assuming that T; ~ PAR(400, 5). 
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The failure time of a certain electronic component follows an exponential distribution, 
X ~ EXP(6). In a random sample of size n = 25, one observes X = 75 days. 

(a) Compute the MLE of the reliability at time 8 days, R(8). 

(b) Compute an unbiased estimate of R(8). 

(c) Compute a 95% lower confidence limit for R(8). 

(d) Compute a 95% lower tolerance limit for proportion 0.90. 


Grubbs (1971) gives the following mileages for the failure times of 19 personnel carriers: 
162, 200, 271, 302, 393, 508, 539, 629, 706, 777, 884, 1008, 
1101, 1182, 1463, 1603, 1984, 2355, 2880 
Assume that these observations follow a two-parameter.exponential distribution with 
known.threshold parameter, 7 = 100; that is, X ~ EXP(@, 100). 
(a) Give the distribution of 2n(X — 100)/0. 


(b) Compute:a lower 90% confidence limit for 0. 
(c) Compute a 95% lower tolerance limit for proportion 0.80. 


Rework Exercise 13, assuming that‘only the first 15 failure times for the 19 carriers were 
recorded. 


Wilk et al. (1962) give the first 31 failure times (in weeks) from an accelerated life test of 34 
transistors as follows: 


3, 4, 5, 6, 6, 7, 8, 8, 9, 9, 9, 10, 10, 11, 11, 11,13, 
13, 13, 13, 13, 17, 17, 19, 19, 25, 29, 33, 42,42, 52 


It may be that a threshold parameter is needed in this problem, but for illustration 
purposes suppose that X ~ EXP(@). 

(a) Estimate 6. 

(b) Compute a 90% lower confidence limit for 6. 

(c) Compute a 90% lower tolerance limit for proportion 0.95. 

(d) Compute a 50%. lower tolerance limit for proportion 0.95. 

(e) Estimate xo 95. 


Suppose in Exercise 15 that the experiment on the 34 transistors had been terminated after 
50 weeks. 

‘(a) Estimate 8. 

(b) Set a lower 0.90 confidence limit on 6. 

(c) Compute a lower (0.90, 0.95) tolerance limit. 
One hundred light bulbs are placed on test, and the experiment is continued for one year. 
As light bulbs fail they are replaced with new bulbs, and at the end of one year a total of 
85 bulbs have failed. Assume EXP(6). 

(a) Estimate 6. 

(b) Test Hy: 0 > 1.5 years against H,: @ < 1.5 at a = 0.05. 

(c) Compute a 90% two-sided confidence interval for 0. 
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(d) If bulbs are guaranteed for six months, estimate what percentage of the bulbs will 
have to be replaced. 


(ce) What warranty period should be offered if one wishes to be 90% confident that at 
least 95% of the bulbs will survive the warranty period? 


Consider the ball bearing data of Exercise 21 in Chapter 13. If we assume that this set of 
data is‘a complete sample from a Weibull distribution, then the MLEs are B = 2.102 and 
6 = 81.88. 


(a) Use equation (16.4.33) to compute an approximate lower 0.95 level confidence limit 
for p. 

(b) Use equation (16.4.34) to compute an approximate lower 0,90 level confidence limit 
for R,(75). 

(c) Compute the MLE of the 10th percentile, xo 10. 


(d) Use equation (16.4.36) to compute a lower 0.95 level tolerance limit for proportion 
0.90. 


(e) Use equation (16.4.38) to compute an approximate lower 0.90 level confidence limit 
for 8. 


Consider:the censored Weibull data of Example.16.4.1. 
(a) Compute the simple estimates 5 and f = 1/6, using equation (16.4.42). 
(b) Compute the simple estimates & and 6 = exp (8), using equation (16.4.43). 


Compute the simple estimates of 4, 8, €, and 9, using equations (16.4.40) and (16.4.41), with 
the ball bearing data of Exercise 21 in Chapter 13. 


Let X ~ POI). Show. that 
fe-lLw<fOsH forx<p. 


and 
Sx-1H)>f%;4 forx>p 
Verify equation (16.2.9). 


Let X denote the number of people seeking a haircut during a one-hour period, and 
suppose that X ~ POJ(4). If a barber will service three people in an hour: 

(a). What is the probability that all customers arriving can be serviced? 

(b) What is the probability that all but one potential customer.can be serviced? 

(c) How many people must the barber be able to service in an hour to be 90% likely to 

service everyone who arrives? 

(d) What is the expected number of customers arriving per hour? Per 8-hour day? 
(ce) What is the expected number of customers serviced per hour? 
( 


f) If two barbers are available, what is the expected number of customers serviced per 
hour? ; 


Assume that the number of emissions of particles froma radioactive source is a 
homogeneous Poisson process with intensity A = 2 per second. 


(a) What is the probability of 0 emissions in 1 second? 
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(b) What is the probability of 0 emissions in 10 seconds? 

(c) What is the probability of 3 emissions in 1 second? 

(d) What is the probability of 30 emissions in 10 seconds? 

(e) What is the probability of 20 emissions or less in 10 seconds? 


In World War II, London was divided into n = 576 small areas of 1/4 square kilometer 
each. The number of areas, m,, receiving y flying bomb hits is given by Clarke (1946), and 
these are listed below. 


y O Lt 2 34 35 
My o-229... 211.93. 35.72.04 


y 
The total number of hits was 537. Although clustering might be expected in this case, the 
Poisson model was found to provide a good fit to this. data. Assume that Y ~ POI{y). 

(a) Estimate y from the data. 

(b). Under the Poisson assumption, compute the estimated expected number of areas 
receiving y hits, 2, = nf(y, f), for each y, and compare these values to the observed 
values my, . 

(c) What is the estimated probability of an area receiving more than one hit? 


Mullet (1977) suggests that the goals scored per game by the teams in the National 
Hockey League follow.independent Poisson variables. The average numbers of goals 
scored per game at home and away. by each team in the 1973-74 season are given in 
Exercise 2 of Chapter 15. 
Assume a Poisson model with these means. 
(a). What is the probability that Boston scores more than three goals in any away 
game? 
(b) What is the probability that Boston scores more than six goals in two away games? 
(c) What is the most likely number of goals scored by Boston in one away game? 
(d) If the first eight teams play at home against the other eight teams, what is the 
distribution of S, the total number of home goals scored? What is the distribution of 
T, the total number of home and away goals scored in the eight games? 
(e) What is the distribution of the total number of goals scored by Boston in a 78-game 
season? 
()} If Boston plays Atlanta in Atlanta, what is the probability that Boston wins? That is 
P[X < Y], where Y represents the number of Boston goals and X represents the 
number of Atlanta goals. 


The probability of a typographical error on a page is 0.005. Using a Poisson 
approximation: 

(a) What is the expected number of errors in a 500-page book? 

(b) What is the probability of having five or fewer errors in a 500-page book? 

(c) What size sample (of pages) is needed to be 90% sure of finding at least one error? 


A certain mutation occurs in one out of 1000 offspring. How many offspring must be 
examined to be 20% sure of observing at least one mutation? 
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Suppose that X.~ BIN(20, 0.1). 
(a) Compute P[X < 5]. 
(b) Approximate P[X 
(c) Approximate P[X 


5] with a Poisson distribution. 
5] with a normal distribution. 


IN IN 


In Exercise 23, what is the probability that the barber can finish a 10-minute coffee break 
before the first customer shows up? What is the probability that a person arriving after 30 
minutes will not get serviced? What is the mean waiting time until the first customer 
arrives? 


In Exercise 24, what is the probability of at least one emission within 0.5 seconds? What is 
the probability that the time until the third emission is less than 0,5 seconds? What is the 
mean time until the third emission? 


Suppose that the breakdowns of a repairable system occur according to a 


nonhomogeneous Poisson process with failure intensity A(t) = 2t/9, where time is 
measured in days. 
(a) What is the mean number of breakdowns in one week? 
(b) What is the probability of five or fewer breakdowns in a week? 
(c): What is the probability that the first breakdown. will occur-in less than one day? 
(d) ‘What is the average time to the first breakdown? 
(ec) If 10 independent systems were in operation, what would be the mean time to the 
first breakdown from any of the 10 systems? 


Let X, ~ f(x) with E(X,)) = by and Var(X;) = 0}. Find the mean and variance of the 
mixed density 


k 
fe) = ¥ pS) 


In Exercise 26, Boston has two home games and one away game. 
(a) If one of these games is selected at random, what is the probability that the number 
of goals scored in it will be less than or equal to 4? 
(b). What is the expected number of goals scored in the game? 


Assume that the 16 NHL teams given in Exercise 26 represent a random sample from a 
conceptually large population of teams, and that the number of goals scored per home 
game for any team selected at random follows a Poisson model for fixed yu, 
X|u~ POI), where up ~ GAM(y, 2). 
(a). Estimate y by using the average of the 16 at-home values to estimate the mean of 
the gamma distribution. 
(b). What is the marginal pdf of the number of at-home goals scored by a team selected 
at random? 
(c) Find PLX < 4] using k and y values from (a). 
(d) Estimate E(X) and Var(X). 
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36. For the airplane air conditioner time-to-failure example in Section 15.5, suppose that 
A ~ GAM(0.0005, 20). 
(a) What is the probability of no failure in 100 hours for a plane selected at random? 
(b) What is the mean time to first failure of an airplane selected at random? 
(c) What is the probability that the time to first failure is less than 100 hours? 


REVIEW OF SETS 


_ The study of probability models requires a familiarity with some of the basic 
notions of set theory. 

A set is a collection of. distinct objects. Other terms that sometimes are used 
instead of set or collection are family and class. Sets usually are designated by 
capital letters, A, B, C,..., or in some instances with subscripted letters A,, A,, 
A;,.... In describing which objects are contained in a set A, two methods are 
available: 


1. The objects can be listed. For example, A = {1, 2, 3} is the set consisting 
of the integers 1,2, and 3. 

2. A verbal description can be used. For example, the set A above consists 
of “the first three positive integers.” A more formal way is to write 
A = {x|x is an integer and 1 < x < 3}. More generally, if p(x) is a 
statement about the object x, then {x | p(x)} consists of all objects x 
such that p(x) is a true statement. Thus, if A = {x | p(x)}, then ais in A if 
and only if p(a) is true. This also can be related to the listing method 
if p(x) is the statement x = a, or xX =a, or..., or x =a, when 
A = {@), Az, 1015 Ay}: 
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The individual objects in a set A are called elements. Other terms that some- 
times are used instead of element are member and point. In the context of prob- 
ability, the objects usually are called outcomes. When a is an element of A we 
write a € A, and otherwise a ¢ A. For example, 3 € {1, 2, 3}, but 4 ¢ {1, 2, 3}. 

In most problems we can restrict attention to a specific set of elements and no 
others. The universal set, which we will denote by S, is the set of all elements 
under consideration. In probability applications, such a set usually is called the 
sample space, and it consists of all outcomes of some experiment that is to be 
performed. 

Another special set, called the empty set or null set, is denoted by ©. It is the 
set that contains no elements. For example {x|x is an integer and x? = 2} = @, 
because the solutions x = +,/2 are not integers. 

In some cases all of the elements in a set A also ate contained in another set B. 
If this is,the case, then we say that A is a subset of B, denoted Ac B. For 
example, if A = {1, 2, 3} and B = {1, 2, 3, 4}, then A c B. It is always the case 
that @ < ACS, for any set A under consideration. 

There are standard ways to combine two or more sets into a new set: 


1. The intersection of two sets A and B, denoted by A + B, is 
An B={x|xeA and x eB} 


For example, if A = {1, 2,3} and B= {2, 3, 4}, then 4 mn B = {2, 3}. 
2. The union of two sets A and B, denoted by A vu B, is 


AU B={x|xeA or xe B} 


For example, if A and B are the sets given in part 1, then 
Av B=({1, 2,3, 4}. 
3. The complement of a set A, denoted by A’ or A, is 


A’=A={x|x eS and x ¢ A} 


For example, if A is the set given in part 1 and S = {1, 2, 3, 4, 5}, then 
A’ = {4, 5}. : 
4. The difference of Aand Bis A—-B=An B. 


Sometimes it is convenient to use a graphical device known as a Venn diagram. 
Such diagrams for intersection, union, and complement are given in Figure A.1. 
The points inside the rectangles are associated with S, and the points inside the 
circles are associated with the sets A and B. The shaded regions correspond to 
the intersection, union, and complement respectively. 

In some cases, two sets A and B have no elements in common. This can be 
expressed by writing A a B= @, and saying that A and B are disjoint. In prob- 
ability applications we say that A and B are mutually exclusive in this case. The 
Venn diagram of disjoint sets corresponds to nonoverlapping circles, as shown in 
Figure A.2. 
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ANB AUB A! 


ANB=4 


The notions of intersection and union can be extended to more than two sets. 


. We can define the intersection and union of three sets A, B, and C to be respec- 


tively 
AN BoC={x|xeéA and xé€B and xeC} 


and 
AUBUC#={x|xeA or xe B or xeC} 

Another way to accomplish this would be to use parentheses along with the 
definitions of intersection and union of two. sets. For example, 
ANBAC=AN(BN C)=(A NB) OC. To avoid ambiguity, it would be 
desirable to establish that the way the sets are grouped with parentheses does not 
make a difference. This and several other properties of “set algebra” are stated in 
the following theorem. 


For any subsets A, B, and C of S, the following equations are true: 


L AU(BUC)=(AU B)UC and AN(BNC)=H(AN BNC. 
2 AUB=BUAand ANB=BOA. 
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3. AU(BOC)=(Av B)n (AUC) and 
AN(BUC)=(AN B)VU(AN OC). 
4AUG=Aand ANS=A. 
5. AUA' =S and ANA =@, ee 


Each assertion can be verified easily by Venn diagrams; however, a more 
formal way would involve showing that the set on either side of the inequality is 
included in the set on the other side. Equations 1, 2, and 3 are referred to as 
associative, commutative, and distributive laws, respectively. 

Other useful identities are given in the following theorem. 


For any subsets A and B of S, the following equations are true: 
1 (AY =A. 
2. @ =S and S'= ©. 
3. AVA=A and AN A=A. 
4.AUS=S and An @=2@. 
5. AU{AN B=A and AN(AUB=A. 
6. (A U BY =A’ co Band (An BY =A’ UB. 2 


The identities given in part 6, known as De Morgan’s laws, are particularly 
useful in many probability applications. 

A third theorem gives identities that are useful when one set is a subset of 
another. 


The following statements about sets A and B are equivalent: 


1.ACB. 
2 ANB=A. 
3. AUB=B. 


Notice that property 4 of Theorem A.1 and properties 3,4, and 5 of Theorem 
A.2 can be viewed as corollaries of Theorem A.3, because @ c ACS, AcA, 
Ao B is:a subset of both A.and B, A and B are both subsets of A U B, and 
ANBCAUB. : 

The notions of intersection and union are extended easily to more than three 
sets, but it is more convenient in this case to use subscripted set notation A,, A,, 
ee 


1. The intersection of A,, A,,..., A, is defined as 


A, VA, 0°°: OA, = {x|x € A; for all i= 1, 2,..., 0} 


Theorem A.4 


FIGURE A.3 
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2. The union of A;, Az, ..., A, is defined as 


A, U A,U°++:U A, = {x|x € A; for at least one i = 1, 2,..., n} 


n 
More concise notations for these expressions are, respectively, (| A, and 
ist 


|} A;, and the terms finite intersection and finite union, respectively usually are 
i=1 
applied to them. 

There are counterparts, in the case of n sets, to many of the properties in 
Theorems A.1, A.2, and A.3, but they generally are harder to state. One property 
that is very useful in the area of probability is a generalization of the distributive 
law. 


If A,, Az,..., A, and B are subsets of S, then the following equations are true: 


LoBin (Ay urd, VU U0 A,) =(B:A,) 0 (Bn A) UU (Bn A,). 
2. BU(Ay NA, N+) ON A,) =(B YU AY) A (BUA) TN: AN (BUA). 


Property 1 is the most frequently used of the two statements, because it pro- 
vides a way to partition a set B into subsets. In particular, suppose that A,, A,, 
..., A, are pairwise disjoint sets (A; 7 A; = @ if i $j), which also are exhaustive 
in the sense that A, U A, U*:- UA, = S- It can be established from the preced- 
ing theorems that 


B=(BNA,)U(BN A.) U°:: U{(BNA,) 


which partitions B into disjoint sets BA A,, Bo A,,...,BOA,. This par- 
titioning also is seen easily by means of an appropriate Venn diagram, such as 
that in Figure A.3. 
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In the development of probability it often is necessary to consider higher 
dimensional or vector quantities. The notion of Cartesian products is useful in 
this regard. 

if A and B are two sets, then the Cartesian product of A and B, denoted by 
A x B, is defined to be the following set of ordered pairs: 


Ax B= {(x, y)|xeA and ye B} 


For example, if A and B are the closed intervals A =[1, 3] = {x|x is real and 
1<x <3} and B=[1, 2] ={yly is real and 1<y <2}, then A x B can be 
represented as a rectangle in the x—y plane, as shown in Figure A4. 

Notice that if we associate A and B with the corresponding Cartesian product ° 
sets A* = A x (—00, 00) and B* =(—0o, 00) x B, then the Cartesian product 
set A x B is identical to the intersection A* m B*. This correspondence is useful 
in certain probability problems in which an experiment consists of performing 
two successive steps, such as tossing a coin twice or drawing two cards from a 
deck. 

Some problems also require higher-dimensional Cartesian product sets. If A,, 

A,,..., A,-are sets, then the n-fold Cartesian product consists of the following 
set of n-tuples: 


A, X Ay X + K Ay = {(X 4, Xy5 0005 Xp) |X; € A; for all i= 1, 2,..., 0} 


The question of how many elements are in a set often is of considerable impor- 
tance in probability applications. A set A.is said to be finite if its elements corre- 
spond in a one-to-one manner to the elements in a set of integers of the form 
{1, 2,..., n} for some positive integer n. It is said to be countably infinite if its 
elements correspond in a one-to-one manner to the elements in the set of all 
positive integers {1, 2,...}. For example, the set of all positive even integers 
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{2, 4, 6, ...} is countably infinite, because each positive integer has the form 2i for 
some positive integer i. This establishes a one-to-one correspondence, i+ 2i. 
Although the correspondence is harder to describe, the set of all integers also can 
be put into one-to-one correspondence with the set of positive integers, and hence 
is countably infinite. A set is said to be countable if it is either finite or countably 


infinite. 
The notions of intersection and union are extended easily to a countably infin- 


ite collection (or an infinite sequence) of sets. 
1. The intersection of A,, A,,...is defined as 
A, OA, 0°::={x|xe A; for alli=1, 2,..} 
2. The union of A,, A,, ... is defined as 


A, U A, U ++ = {x|x A; for at least one i = 1, 2,...} 


More concise notations for these expressions are, respectively, () A; and 


i=l 


U Ai. , 
i=1 

As an example, let A, = {1}, and let A; = {1, 2,..., i} for i= 2, 3,.... Then 
(\ A; = {1} and () A; = {1, 2,...}. These are called, respectively, countably 
i= i=1 


infinite intersections and countably infinite unions. An intersection (or union) is 
called a countable intersection (or union) if it is either a finite intersection (or 
union) or a countably infinite intersection (or union). 

Infinite sets that are not countably infinite are difficult to characterize in 
general. However, the only ones that will be of interest in this development are 
intervals, Cartesian products of intervals, and finite or countably infinite unions 
and intersections of such sets. 
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TABLE B84 


Special discrete distributions 


Notation and 


Name of Distribution Parameters Discrete pdf f(x) Mean Variance MGF.M,(t) 
X ~ BIN(n, n 
Binomiai (np) (loro np npq (pe'+q)? 
O0<p<1 X=0,1,...,0 
q=1-p 
Bernoulli X ~ BIN(1, p) p’q'"* p pq pe'+q 
O<p<1 x=0,1 
q=1-p 
X~ NB(r, x4 eff 
Negative Binomial (P) ( ora r/p rq/e" ( z ) 
r-1 1-qge' 
O<p<1 x=r,°+1,.... 
r=1,2,... 
. pe’ 
Geometric X ~ GEO(p) pq" * 1/p q/p? Veet 
O<p<i x=1,2,.... 
q=1-p 
M\{N- M N-n 
Hypergeometric X ~ HYP(n, M, N) AMIN Aa—{t-— = 
x/\n-x na N N/ N-1 
n=1,2,...,N x=0,1,...,n 
M=0,1,...,N 
X ~ PO ene" 
Poisson ) <= H H euler) 
x! 
O<u x=0,1,... 
N+1 N?-14 1 ef ~ elt 
Discrete Uniform X ~ DU(N) 1/N A . N Faqs 
N=1, 2,... x=1,2,...,N 


* Not tractable. 
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TABLE B.2 
. Special continuous distributions ; 
Name of Notation and 
Distribution Parameters Continuous pdf f (x) Mean Variance MGF M(t) 
1 a+b (b- a)? e?! — gt 
Unifor: X ~ UNIF(a, b 
aifoup tne) b—a 2 12 (b—a)t 
a<b a<x<b , 
Normal X ~N(u, 0) d e~ lx myiel2/2 : oe : gutte2i2/2 
./2ne 
0<a? 
Gamma X ~ GAM(6, x) : xe he H8 «6 «0? ( : y 
: aT (x) 178 
0<8@ O<x 
O<k 
Exponential X ~ EXP(8) ~~ ev 8 e ete 
6° 1-8 
0<8 O<x 
1 en 
Two -Parameter X ~ EXP(G, n) a e7hame q+ 6? 
Exponential 1- ot 
0<6 SX 
i 1 -nye 2 an 
Double-Exponential X ~ DEG, 7) ial am n 26 7 ey 
0<86 
‘ : B F 1 2 * 
Weibull X ~ WEI(9, B) — xP-le- Gi) ory i+ a rl1+— 
ld B B 
o<6 O0<x ( | 
~f4p+e— 
0<f B, 
O<f 
1 n*@? 
Extreme value X ~ EV(G, n) 3 exp{[(x — 7)/8} n- yO rs e"T(1 + ot) 
0<8 ~exp[(x — 1)/0]} y = 0.5772 
(Euler's 
constant) 
1 . ae ee RH 
Cauchy X ~ CAU(9, n) ESTERS 
On{1 + [(x — 9)/8)7} 
0<6 
2. ee 
Pareto X ~ PAR(@, x) ae : we 
64 +x/0)**? K-1 («- 2)(K-1)? 
0<8 O<x Toc @ 2<xK 


O<k 
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TABLE 8.2 (continued) 


—————— LL teeta ennee 


Name of Notation and 
Distribution Parameters Continuous pdf f (x) Mean Variance MGF M(t 
> =<... = = = —— 
Lognormal X ~ LOGN(y, 0?) e Mle x—ai/a} 2/2 ghtor/2 e2#t02(ga? 4) * 
2nxo 
0<a? 
ee ee 
1 exp[(x = 9)/8] x 
Logisti X ~ LOG (8, oo — 6 Bt 
gistic (8, 7) 6 (1 + expl(x—7)/8])? 1 3 e"'n6Btesc (n6t) 
0<@ 
2 1 $24 f2 1 i 
Chi-Square xX~ Ss Re 2 —. 
i-Sq x7(v) PPT (y/2) x v v (; ~) 
v=1, 2, O<x 
y+ 
r 2 1 K2\ Tee y 
Student's ¢ X~t —_— fF T+ 0 + 
lent's (v) (2) =( “) v~2 
2 
v=1,2,... i<v 2<¥v 
Vy FV a 
“ 2 v2 y 2v2(v, + ¥2— 2) 
Snedecor's F X ~ Fv, v2) (2) xerl2y-} 2 2\¥4 2 43 
(3)r(2) Vo ¥y~2 v4 (¥2 — 2)?(¥g ~ 4) 
2 2 
y= 1, 2, eae v4 —tyttv2y2 
x(1+—x 2<¥, A<vy 
Hy 1, Dy ahs vy 
T(a+b) a ab 
Bet. X ~ BETA(a, b x81 x)P-7 : * 
or ; ne Terie). a+b (a+b+1)(a+by? 
O< 
id O<x<1 
O<b 


* Not tractable. 
** Does not exist. 


TABLES OF 
DISTRIBUTIONS 


598 


The authors thank the organizations mentioned below for granting permission to 
use certain materials in constructing some of the tables in Appendix C: 


Portions of Table 8 are excerpted from Table A-12C of Dixon & Massey, Intro- 
duction to Statistical Analysis, 2nd ed., McGraw-Hill Book Co., New York. 


Portions of Table 9 are excerpted from Table 1 of “Modified Cramer-Von Mises 
Statistics for Censored Data,” Biometrika, 63, 1976, with the permission of the 
Biometrika Trustees. 


Portions of Tables 10 and 11 are excerpted from Table 1 of “EDF Statistics for 
Goodness-of-Fit and Some Comparisons,” Journal of the American Statistical 
Association, 69, 1974. 


Portions of Tables 10 and 11 are excerpted from Table 1 of “Goodness-of-Fit for 
the Extreme-Value Distribution,” Biometrika, 64, 1977, with the permission of the 
Biometrika Trustees. 
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TABLE 2 Poisson cumulative distribution function 
x 
F(x; =), evtetk! 
k=0 


p 
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 

QO 0.9048 0.8187 0.7408. 0.6730. 0.6065. 0.5488 0.4966. 0.4493. 0.4066. ..0,3679 
1 0.9953 0.9825 0.9631 0.9384 0.9098 0.8781 0.8442 0.8088 0.7725 0.7358 
2 0.9998 0.9989. 0.9964. 0.9921 0.9856. 0.9769... 0.9659... 0.9526. 0.9371 0.9197 
3 1.0000 0.9999. 0.9997 0.9992. 0.9982 0.9966 0.9942. 0.9909 0.9865... 0.9810 
4 1.0000 1.0000 0.9999 0.9998. 0.9996 0.9992 0.9986 0.9977 . 0.9963 
5 1.0000. 1.0000...1.0000 - 0.9999 ...0.9998 : .0,9997.: 0.9994 
6 1.0000 ..1.0000.° 1.0000... 0.9999 


yt 
x 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 15.0 
0 0.1353 0.0498 0.0183. 0.0067. 0.0025. 0.0009. 0.0003. 0.0001. 0.0000 
1 0.4060 (0.1991. 0.0916 0.0404 0.0174 0.0073 .. 0.0030 . 0.0012. 0.0005 
2 0.6767 | 0.4232 0.2381. 0.1247. 0.0620. 0.0296 0.0138 0.0062. 0.0028 . 0.0000 
3 0.8571 0.6472 0.4335. 0.2650. 0.1512. 0.0818 0.0424 0.0212. .0.0103. 0.0002 
4 0.9473 0.8153... 0.6288. 0.4405. 0.2851. 0.1730... 0.0996 .. 0.0550... 0.0293 - 0.0009 
5 0.9834 0.9161. 0.7851 0.6160. 0.4457 0.3007... 0.1912. 0.1157. 0.0671. 0.0028 
6 0.9955 0.9665 © 0.8893 . 0.7622. 0.6063. 0.4497 - 0.3134 © 0.2068 ©. 0.1301 0.0076 
7 0.9989 . 0.9881 0.9489 0.8666 - 0.7440. 0.5987. 0.4530.» 0.3239. 0.2202. 0.0180 
8 0.9998 09962 0.9786 0.9319 0.8472 0.7291 0.5925 0.4557. 0.3328 0.0374 
9 1.0000 0.9989 0.9919 - 0.9682 0.9161 0.8305 0.7166 0.5874 0.4579 0.0699 
10 0.9997. 0.9972. 0.9863. 0.9574. 0.9015. 0.8159 0.7060. 0.5830. 0.1185 
cr 0.9999 0.9991. 0.9945 0.9799 0.9466 0.8881. 0.8030. 0.6968 0.1848 
12 4.0000 . 0.9997. 0.9980. 0.9912 0.9730 0.9362 0.8758 0.7916 0.2676 
13 0.9999 0.9993. 0.9964. 0.9872 0.9658 0.9261. 0.8645. 0.3632 
14 1.0000. 0.9998 0.9986 0.9943. 0.9827 0.9585 0.9165 0.4657 
15 0.9999. 0.9995. 0.9976. 0.9918 0.9780. 0.9513. 0.5681 
16 7.0000 . 0.9998 0.9990- 0.9963. 0.9889 0.9730 0.6641 
17 : 0.9999 0,9996 0.9984 0.9947. 0.9857... 0.7489 
18 4.0000. . 0.9999 . 0.9994... 0.9976. 0.9928. 0.8195 
19 1.0000. 0.9997. 0,9989 0.9965 0.8752 
20 0.9999 0.9996 0.9984 0.9170 
21 1.0000 . 6.9998. 0.9993. 0.9469 
22 : 0.9999. 0.9997. 0.9673 
23 1.0000 0.9999 0.9805 
24 1.0000. 0.9888 
25 0.9938 
26 : 0.9967 
27 0.9983 
28 0.9991 
29 0.9996 
30 0.9998 
31 0.9999 


32 1.0000 


TABLE 3 
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Standard normal cumulative distribution function @(z) and 100 x yth 
percentiles z, 


@(z) = i i e-?2 dt 


-2 2n 


zZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 


0.0 0,5000. 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 
0.1. 0.5398 0.5438. 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 
0.2. 0.5793 0.5832 0.5871 0.5910 05948 0.5987 0.6026 0.6064 0.6103 0.6141 
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 06808 0.6844 0.6879 


0.5 0.6915. 0.6950. 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 
0.6 0.7257 0.7291 0.7324 0.7357. 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 
0.7. 0.7580 0.7611 0.7642 0.7673. 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 
0.8 0.7881 0.7910 0.7939. 0.7967. 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 
0.9. 0.8159. 0.8186. 0.8212 0.8238 0.8264 0.8289 0.8314 0.8340 0.8365 0.8389 


1.0 0.8413 0.8438 0.8461 0.8485. 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 
1.1 0.8643 0.8665 0.8686 .0.8708:.. 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 
1.2 0.8849 0.8869. 0.8888 0.8907. 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 
1.3 0.9032. 0.9049 0.9066. .0.9082.. 0.9099 0.9115 0.9131 0.9147 0.91652 0.9177 
1.4 0.9192 0.9207 ..0.9222. 0.9236. 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 


1.5 0.9332 0.9345 0.9357. 0.9370. 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 
1.6 0.9452..-0.9463 - 0.9474. 0.9484. 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 
1.7. 0.9554. 0.9564. 0.9573 --.0.9582. 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 
1.8 0.9641 0.9649 0.9656 0.9664. 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 
1.9 09713 0.9719 0.9726 0.9732.. 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 


2.0 0.9772. 0.9778. 0.9783 0.9788. 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 
21 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 
2.2 0.9861 0.9864 0.9868 0.9871. 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 
2.3. 0.9893. 0.9896. 0.9898 0.9901. 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 
24 0.9918 0.9920 0.9922 0.9925. 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 


25 0.9938 0.9940 0.9944 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 
26 0.9953. 0.9955.. 0.9956 0.9957. 0.9959 0.9960 09961 0.9962 0.9963 0.9964 
2.7 0.9965 ©0.9966. 0.9967 0.9968. 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 
2.8 0.9974 0.9975 0.9976 0.9977.. 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 
2.9 0.9981 0.9982 0.9982 0.9983. 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 


3.0 0.9987. 0.9987. 0.9987 0.9988. 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 
3.1 0.9990 0.9991 0.9991 0.9991. 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 09994 0.9995 0.9995 0.9995 
3.3 0.9995 0.9995. 0.9995 0.9996. 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 
3.4 0.9997. 0.9997. 0.9997 0.9997. 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 


y 0.90 0.95 0.975 0.99 0.995 0.999 0.9995 0.99995 6.999995 


z 1.282 1.645 1.960 2.326 2.576 3.090 3.291 3.891 4.417 
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TABLE 5 Cumulative distribution function H(c; v) of the chi-square distribution 
with v degrees of freedom 


H(c; ¥) = [ny ») dy 


v 


0.02 0.112 0.010 0,001 

0.06 0.194 0.030 0.004 

0.10 0.248 0.049 0,008 0.001 

0.20 0.345 0.095 0.022 0.005 0.001 

0.60 0.561. 0.259 0.104 0.037 0.012 0.004. 0.001 

1.0 0.683 0.393 0.199 0.090 0.037 0.014:0.002 0.001 

1.4 0.763 0.503 0.294 0.156 0.076 0.034. 0.014 0.006 0.002 0.001 

1.8 9.820 0.593 0.385 0.229 0.124 0.063 0.030 0.013 0.006 0.002 0.001 

2.2 0.862 0.667 0468 0.301 0.179 0.100. .0.052....0.026..0.012....0.005...0.002.. 0.001 

2.6 0.893 0.727 0.543 0.373 0.239 0.143 0.081. 0.043 0.022 0.011. 0.005 © 0.002 

3.0 0.917 0.777 0.608 0.442 0.300 0.191 °0.115> 0.066 -0.036 0.019 ..0.009 0.004 

3.4 0.935 0.817 0.666 0.507 0.361 0.243 0.154 0.093 0.054 0.030 0.016 0.008 

38 0.949 0.850 0.716 0.566 0.421 0.296 0.198 0.125. 0.076 0.044 0,025 0.013 

4.2 0.960 0.878 0.759 0.620 0.479 0.350. 0.244 -0,161. 0.102:.'0.062° 0.036: 0.020 

4.6 0.968 0.900 0.796 0.669 0.533 0.404..0.291. 0.201:.-0.132-..0.084.. 0.051. 0.030 

5.0 0.975 0.918 0.828 0.713 0.584 0.456 0.340 0.242 0.166 0.109 0.069 0.042 

5.4 0.980 0.933 0.855 0.751 0.631 0.506 0.389 0.286 0.202 0.137 0.090 0.057 

5.8 0.984 0.945 0.878 0.785 0.674 0.554: -0.437.-0.330.. 0.240. 0.1680.114.. 0.074 

6.2 0.987 0.955 0.898 0.815 0.713 0.599. 0.483. 0.375. 0.280. 0.202...0.140. 0.094 

6.6 0.990 0.963 0.914 0.841 0.748 0.641 0.528 0.420 0.321 0.237 0.170 °° 0.117 

7.0 0.992 0.970 0.928 0.864 0.779 0.679 0.571 0.463 0.362 0.275 0.201 0.142 

74 0.994 0.975 0.940 0.884 0.807 0.715...0.612° 0,506... 0.404...0.313.. 0.234...0.170 

78 0.995 0.980 0.950 0.901 0.832 0.747. 0.649 0.547. 0.446. 0.352 0.269 . 0.199 

8.2 0.996 0.983 0.958 0.915 0.854 0.776 0.685 0.586 0.486 0.391 0.305 0.231 

8.6 0.997 0.986 0.965 0.928 0.874 0.803 0.717 0.623 0.525 0.430 0.341 0.263 

9.0 0.997 0.989 0.971 0.939 0.891 0.826. .0.747...0.658.. 0.563. 0.468: 0.378....0,297 
10.0 0.998 0.993 0.981 0.960 0925 0.875 0.811 0.735. 0.650 0.560 0.470. 0.384 
11.0 0.999 0.996 0.988 0.973 0.949 0.912° 0.861. 0.798 0.724 0.642. 0.557. 0.471 
12.0 0.999 0.998 0.993 0.983 0.965 0.938 0.899 0.849 0.787 0.715 0,636 0.554 


13.0 0.998 0.995 0.989 0.977 0.957. 0.928 .0.888. 0.837. 0.776. 0.707.. 0.631 
14.0 0.999 0.997 0.993 0.984 0.970. 0.949 0.918 0.878 0.827: .0.767.. 0.699 
15.0 0.999 0.998 0.995 0.990 0.980 0.964. 0.941. 0909. 0.868 0.818 0.759 
16.0 0.999 0.997 0.993 0.986 0.975 0.958 0.933 0.900 0.859 0.809 
17.0 0.999 0.998 0.996 0.991. 0.983 0.970. 0.951. 0.926. 0.892. 0,850 
18.0 0.999 0.997 0.994 0.988 0.979 0.965 0.945. 0.918. 0.884 
19.0 0.999 0.998 0.996 0.992: 0.985. 0.975 0.960 0.939 0.911 
20.0 0.999 0.999 0.997 0.994 0.990 0.982 0.971 0.955 0.933 
25.0 0.999 0.998 0.997 0.995 0,991 0.985 


30.0 9,999 .0.998 0.997 


TABLE & 
(continued) 
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9.0 
10.0 
11.0 
12.0 
13.0 
14.0 
15.0 
16.0 
17.0 
18.0 
19.0 
20.0 
21.0 
22.0 
23.0 
24.0 
25.0 
26.0 
27,0 
28.0 
29.0 
30.0 
35.0 
40.0 
50.0 
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0.001 
0.001 
0.002 
0.004 
0.006 
0.008 
0.012 
0.016 
0.022 
0.029 
0.038 
0.048 
0.060 
0.096 
0.143 
0.200 
0.264 
0.333 
0.405 
0.476 
0.546 
0.611 
0.671 
0.726 
0.774 
0.815 
0.851 
0.881 
0.905 
0.926 
0.942 
0.955 
0.965 
0.974 
0.993 
0.999 


0.001 
0.001 
0.002 
0.003 
0.005 
0.007 
0.010 
0.014 
0.019 
0.024 
0.032 
0.040 
0.068 
0.106 
0.153 
0.208 
0.271 
0.338 
0.407 
0.477 
0.544 
0.608 
0.667 
0.721 
0.768 
0.809 
0.845 
0.875 
0.900 
0.921 
0.938 
0.952 
0.963 
0.991 
0.998 


0.001 
0.001 
0.002 
0.003 
0.004 
0.006 
0.008 
0.011 
0.015 
0.020 
0.027 
0.047 
0.076 
0.114 
0.161 
0.216 
0.277 
0.343 
0.410 
0.478 
0.543 
0.605 
0.663 
0.716 
0.763 
0.804 
0.839 
0.870 
0.895 
0.917 
0.934 
0.948 
0,986 
0.997 


0.001 
0.001 
0.002 
0.003 
0.005 
0.007 
0.011 
0.013 
0.017 
0.032 
0.054 
0,084 
0.123 
0.169 
0.244 
0.283 
0.347 
0.413 
0.478 
0.542 
0.603 
0.659 
0.711 
0.758 
0.799 
0.834 
0.865 
0.891 
0.912 
0.930 
0.980 
0.995 


0.001 
0.001 
0.002 
0.003 
0.004 
0.006 
0.008 
0.011 
0.021 
0.037 
0.060 
0.091 
0.130 
0.177 
0.230 
0.289 
0.351 
0.415 
0.479 
0.541 
0.600 
0.656 
0.707 
0.753 
0.794 
0.829 
0.860 
0.886 
0.908 
0.972 
0.993 


0.001 
0.001 
0.002 
0.002 
0.003 
0.005 
0.007 
0.014 
0.025 
0.043 
0.067 
0.099 
0.138 
0.184 
0.237 
0.294 
0.355 
0.417 
0.479 
0.540 
0.598 
0.653 
0.703 
0.748 
0.789 
0.824 
0.855 
0.882 
0.961 
0.989 
0.999 


0.001 
0.001 
0.001 
0.002 
0.003 
0.004 
0.009 
0.017 
0.030 
0.048 
0.073 
0.105 
0.145 
0.191 
0.243 
0.299 
0.358 
0.419 
0.480 
0.539 
0.596 
0.650 
0.699 
0.744 
0.784 
0.820 
0.851 
0.948 
0.985 
0.999 


For large v, H(c; v) = ®(z);z = [(c/v) 9-1 + 2/9v1/(2/9v) 7 


0.001 
0.001 
0.002 
0.002 
0.005 
0.011 
0.020 
0.034 
0.053 
0.079 
0.112 
0.151 
0.197 
0.248 
0.303 
0.361 
0.421 
0.480 
0.538 
0.594 
0.647 
0.696 
0.740 
0.780 
0.815 
0.932 
0.979 
0.999 
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TABLE 6 100 x yth Percentiles t,(v) of Student's t distribution with v degrees of 
freedom 


-@ 


ty{v) 
=| F(t; v) dt 
Ne eB 


Y 
v 0.60 0.70 0.80 0.90 0.95 0.975 0.99 0.995 0.9995 

1 0,325 0.727 1.376 3.078 6.314 12.706 31.821 63.657 636.619 
2 0.289 0.617 1.061 1.886 2.920 * 4.303 6.965 9.925 31.598 
3 0.277 0.584 0.978 1.638 2,353 3.182 4.541 5.841 12.924 
4 0.271 0.569 0.941 1.533 2.132 2.776 3.747 4.604 8.610 
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032 6.869 
6 0,265 0.553 0.906 1.440 1.943 2.447 3.143 3.707 5.959 
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499 5.408 
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355 5.041 
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250 4.781 
10 0.260 0.542 0.879... 1.372 4.812 2.228 2.764 3.169 4.587 
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106 4.437 
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055 4.318 
13 0.259 0.538 0.870 1.350 1.771 2.160 2.650 3.012 4.221 
14 0.258 0.537 0.868 1.345 1,761 2.145. 2.624 2.977 4.140 
15 0.258 0.536 0.866 1.344 4.753 2.131 2.602 2.947 ‘4.073 
16 0,258 0.535 0.865 1.337 1.746 2.120 2.583 2.921 4.015 
17 0.257 0.534 0.863 1.333 1.740 2.110 2.567 2.898 3,965 
18 0.257 0.534 0.862 1,330 1.734 2.101 2.552 2.878 3.922 
19 0.257 0.533 0,861 1.328 1.729 2.093 2.539 2.861 3.883 
20 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845 3.850 
21 0.257 0.532 0.859 1.323 1.721 2.080 2.518 2.831 3.819 
22 0.256 0.532 0.858 1.321 4.717 2.074 2.508 2.819 3.792 
23 0.256 0,532 0.858 1.319 4.714 2.069 2.500 2.807 3.767 
24 0.256 0.531 0.857 1.318 1.711 2.064 2.492 2.797 3.745 
25 0.256 0.531 0.856 1.316 1.708 2.060 2.485 2.787 3.725 
26 0.256 0.531 0.856 1.315 1.706 2.056 2.479 2.779 3.707 
27 0.256 0.531 0.855 1.314 1.703 2.052 2.473 2.771 3.690 
28 0.256 0.530 0.855 1.313 1.701 2.048 2,467 2.763 3.674 
29 0.256 0.530 0.854 1.311 1.699 2.045 2.462 2.756 _ 3.659 
30 0.256 0.530 0.854 1.310 1.697 2.042 2.457 2.750 3.646 
40 0,255 0.529 0.851 1.303 1.684 2.021 2.423 2.704 3.551 
60 0.254 0.527 0,848 1.296 1.671 2.000 2.390 2.660 3.460 
120 0.254 0.526 0.845 1.289 1.658 1.980 2.358 2.617 3.373 
co 0.253 0.524 0.842 1.282. 1.645 1.960 2.326 2.576 3.291 
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TABLE 8 Sample size for t test 


Sample size n to achieve power.1 — £ for d=|u—- | /o in one-sample case and n= 
n,=Nz for d’ =|, — p2|/o in two-sample case, for one sided test at significance level 
a. These are approximate n for two-sided test at significance level 2c. 


One-Sample Test Two-Sample Test 


0,005 2977 .| 3567 | 4806 |:0,1 
(0.01) 749°) 894 |.1206 | 0,2 
189 | 226] 304/04 
85] 100; 138 | 06 
49 56 85.] 0.8 
32 38 49) 1,0 
23 27 36 | 1.2 
18 20 27 | 1.4 
14 16 21) 1.6 
11 13 17.| 1.8 
10 11 14°} 2.0 
6 6 8} 3.0 
0.0125 2482 | 3020 | 4180.) 0.1 
{0.025) 629 | 766 | 1057-] 0.2 
157 | 191 265 | 0.4 
71 84) 117/06 
40 49 69 | 0,8 
27 32 44) 1.0 
19 23 31°) 1.2 
15 17 23 | 1.4 
12 14 18.| 1.6 
10 11 16.] 1.8 
8 3 12-| 2.0 
5 5 7 | 3.0 
0.025 2106 .| 2603 | 3680) 0.1 
{0.05} 527. |, 650 | 922] 0.2 
133 | 164} 231 | 0.4 
60 77 | 104 | 0.6 
34 42 59./ 0.8 
22 28 38.] 1.0 
16 19 27°} 1.2 
12 15 20 | 1.4 
10 12 16 | 1.6 
8 10 13] 1.8 
7 8 11] 2.0 
4 is} 6 | 3.0 
0,05 1715} 2166 | 3160) 0.1 
(0,10) 430.| 543 | :793 | 0.2 
109 |} 137 | 199 | 0.4 
48 62 89 | 0.6 
28 35 50 | 0.8 
18 23 33/1. 
13 16 23 | 1.2 
10 12 17 | 1.4 
8 10 1411.6 
7 8 1141.8 
6 7 9) 2.0 
3 4 5 | 3.0 


TABLE 9 


TABLE 70 


TABLE 77 
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Upper critical values CM,_, for completely specified H, 


n rjn 0.10 0.05 0.025 0.01 
0.6 0.189 0.258 ; 0.330 0.427 
0.6 0.241 0.327 0.417 0,539 

‘ 0.7 0.286 0.386 0,491 0.633 
0.8 0.321 0.430 0.544 0.700 
0.9 0.341 0.455 0.573 0.735 
1.0 0.347 0.461 0.581 0.743 


i-« 
H, Statistic 0.90 0.95 6.975 0.99 
EXP(8) (1 +0.16/n)EM 0.177 0.224 0.273 0.337 
WEI(6, B) (1 +0.2/,/n)EM 0.102 0.124 0.146 0.175 
N(u, 62) (1 +0.5/n)6M 0.104 0.126 0.148 0.178 
Critical values for KS and Kuiper statistics — 
l-e 
H, Statistic 0.80 0.95 0.975 0.99 
F(x) (/n+0.12+0.11/./n)D — 1.224 1.358 1.480 1.628 
EXP(6) (/n + 0.26 +0.5/.,/n)(B-0.2/n) 0.995 1.094 1.184 4.298 
N(u, 0?) (fn - 0.1 + 0.85/,/n)B 0.819 0.895 0.955 1.035 
WEI(6, 8) nbd 0.803 0.874 0.939 1.007 
F(x) (/n +0.155 + 0.24/./n)V 1.620 41.747 1.862 2.001 
EXP(8) (/n +0.24 + 0.35/,/n)(V -0.2/n) 1.527 1.655 1.774 1.910 
N(u, 62) (/n + 0.05 + 0.82/,/n)V 1.386 1.489 1.585 1.693 


WEI(8, 8) J/nv 1.372 1.477 1.557 1.671 


614 


TABLE 12 
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Critical values t, for the Wilcoxon signed-rank test, P[T <t,] <a 


n 0.005 0.01 0.025 


0.05 0.10 0.20 0.50 

4 0 2 4 
5 ce) 2 3 7 
6 0 2 3 5 10 
7 0 2 3 5 8 13 
8 0 1 3 5 8 11 17 
9 1 3 5 8 10 14 22 
10 3 5 8 10 14 18 27 
11 5 7 10 13 17 22 32 
42 7 9 13 17 21 27 38 
13 9 12 17 21 26 32 45 
14 12 15 21 25 31 38 62 
15 15 19 25 30 36 44 59 
16 19 23 29 35 42 50 67 
17 23 27 34 4 48 57 76 
18 27 32 40 47 55 65 85 
19 32 37 46 53 62 73 94 
20 37 43 52 60 69 81 104 
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TABLE 13A Values of P[U:<u] for the Mann-Whitney statistic with m = min(n,, m3), 
n=max(n,,n,),2<m<n<8B 


WON A Oa-hwany - = 


OW ON AOR WNH | O 
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TABLE 13B Critical values u, such that P[{U <u,] < « for the Mann-Whitney 
statistic with m = min (n,,n,),n = max (n,,n,),2<m <n < 20 


u, 2h 2g J AyNon, +, + V/12 
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p Values for Spearman's rank correlation coefficient 


PIR, <—r]=P[R,>r] =p 


617° 


n= 


1.000 
0.500 


n= 


1.000 
0.800 
0.600 
0.400 
0.200 
0.000 


n= 


1.000 
0.900 
0,800 
0.700 
0.600 
0.500 
0.400 
0.300 
0.200 
0.100 
0.000 


n= 


1.000 
0.943 
0.886 
0.829 
0.771 
0.714 
0.657 
0.600 
0.543 
0.486 
0.429 
0.371 
0.314 
0.257 
0.200 
0.143 
0.086 
0.029 


3 
0.167 
0.500 


4 
0.042 
0.167 
0.208 
0.375 
0.458 
0.542 


5 

0.008 
0.042 
0.067 
0.117 
0.175 
0.225 
0.258 
0.342 
0.392 
0.475 
0.525 


6 

0.001 
0.008 
0.017 
0.029 
0.057 
0.068 
0.088 
0.121 
0.149 
0.178 
0,210 
0.249 
0.282 
0.329 
0.357 
0.401 
0.460 
0.500 


n= 


1.000 
0.964 
0.929 
0.893 
0.857 
0.821 

0.786 
0.750 
0.714 
0.679 
0.643 
0.607 
0.571 

0.536. 
0.500 
0.464 
0.429 
0.393 
0.357 
0.321 

0.286 
0.250 
0,214 
0.179 
0.143 
0.107 
0.071 

0.036 
0.000 


n= 


1.000 
0.976 
0.952 
0.929 
0.905 
0.881 
0.857 
0.833 
0.810 
0.786 
0.762 
0.738 
0.714 


7 

0.000 
0.001 
9.003 
0.006 
0.012 
0.017 
0.024 
0,033 
0.044 
0.055 
0.069 
0.083 
0.100 
0.118 
0.133 
0.151 
0.177 
0.198 
0.222 
0.249 
0.278 
0.297 
0.331 
0.357 
0.391 
0.420 
0.453 
0.482 
0.518 


8 

0.000 
0.000 
0.001 
0.001 
0.002 
0.004 
0.005 
0.008 
0.001 
0.014 
0.018 
0.023 
0.029 


r p 


n=8 
0.690 0.035 
0.667 0,042 
0.643 0.048 
0.619 0.057 
0.595 0.066 
0.571. 0.076 
0.548 0.085 
0.524 0.098 
0.500 0.108 
0.476 0.122 
0.452 0.134 
0.429 0.150 
0.405 0.163 
0.381.-0.180 
0.357 0.195 
0.333 0.214 
0.310 0.231 
0.286. 0.250 
0.262... 0.268 
0.238. .0.291 
0.214. 0.310 
0.190 ..0.332 
0.167. 0.352 
0.143 ..0.376 
0.179 0.397 
0.095 0.420 
0.071 0.441 
0.048 0.467 
0.024 0.488 
0.000. 0.512 


n=9 
1.000 0.000 
0.983. 0.000 
0.967 0.000 
0.950 0.000 
0.933 0.000 
0.917 0.001 
0.900 0.001 
0.883 0.002 
0.867 0.002 
0.850 0.003 
0.833 0.004 
0.817 0.005 
0.800 0.007 
0.783 0.009 
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TABLE 16 


Asymptotic variances and covariances a,,,4,,, and a,, of ./r6/8. and ./ré/8 for MLEs € and 6, 
andc = 2/[(1 + p?)?a,.] 


pe=rin 


1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 


c 0.82 0.88 1.0 1.15 1.31 1.49 1.67 1.83 1.95 2.01 2.00 
4;;... 1.10867 1.03652. 1.00209 1.01308 1.08718 1.25512. 1.57321 2.15713. 3.29575 6.05171 
az, 0.60793 0.69034. 0.74255 0.78571 0.82367 0.85809 0.88990 0.91965 0.94775 0.97447 
4;, —0.25702 -0.15877..—0.03943 0.10138 0.26796 0.46788 0.71421 1.03158. 1.47506 2.21872 


TABLE 16 


4 


Values of k,,.n Var(5/3), n Cov(E/6, 5/8). and n Var(£/8) for simple estimators § and & 


fi 
2 3 4 5 é 7 8 9 10 45 30 60 es 
k, 069° 098° 115° 4.275 1.35 5-918 1.28°°-1,31- (1365-7405 150 1.53 1,87 
n Var(3/8) 41.42.1004 0.92 086-083. 080 077. 075° 074 071. 068 066 0.65 
n Cov(2/6, 4/8) 013° 0.06 -0.12 -0.14°--0.14  -0.19> -0.19 . -0.20- -0.20  -0.21°° -0.22 -0.23 ~0,23 
n Var(é/8) 1.32.9 1.235. 1.21120 1.20°°°146 > 1460 -49750°1447°° 146" «117°°«116~=~—«1.16 


TABLE 17 


Coefficients for the quadratic approximations of k, , =k, +,/n +k,/n?,c,,=c,+¢,/n 
+c,/n*,d,,=do+d,|/n +d,)n*, hina, +a,/n+a,|n? 


rjn 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.3 0.9 

Ko 0.10265 0.21129 0.32723 0.45234 0.58937 0.74274 0.92026 1.1382 1.4436 
k, -1.0271 —1,0622 1.1060 —1.1634 —1,2415 -1.1340 —1,5313 —1.8567 —2.6929 
k, 0.000 0.030 0.054 0.089 0.145 0.242 0.433 0.906 2.796 
£o ~2.2504 =1.4999 ~1,0309 —0.6717 —0.3665 —0.0874 0.1856 0.4759 0.8340 
c, —-5.5743 -3,070 —2,2859 —1,9301 —1.7619 —1.7114 =1,7727 —2,0110 -2.7773 
C2 -7.201 1,886 ~0.767 —0.335 —=0.091 -0.111 —0.369 -0.891 ~2,.825 
dy 0.25973 0.27113 0.28480 0.30160 0.32305, 0.35188 0.39384 0.46402 0.62397 
d, -0.1259 —0.1436 ~0.1681 —0.2026 —0,2537 —0.3365 —0.4887 —0.8394 —2.1509 
d, 0.044 0.046 0.067 0.102 0.162 0.280 0.550 1.383 5.934 
ao 0.2052 0.4218 0.6514 0.8959 1.1577 1.4391 1.7416 2.0598 2.3394 
a, — 2.052 —2.111 —2.175 ~2.244 —2.314 -2.376 ~2.390 -2.205 ~-0.856 


a, 0.000 0,008 0.002 —0.106 -0.064 —0.188 ~0,526 - 1,682 —7,.928 


ANSWERS TO 
SELECTED EXERCISES 


CHAPTER 7 


1 a. S = {r,g, b} 
b. {r}, {9}, {b}, {r, 9}, {rb}, (9, b},5, 0 oc. {bg} d. O 


(r, 7), (r, ), (”, 9) 
. 5 =4(b, n), (6, d), ( 9) 


(9, r), (9, 5), (9, 9) 


Ko 
% 


b. 9 
e. Cy = {(r, 7), (7, B), 7, 9} = (0, } Y {7 DSU (, @)} 


C2 — {(r, r), (r, b), (r, 9), (b, r), 9, r)} =m {(r, r)} Urs u {g, r)} 
C, O C, = Cc, Ci a C, = {(6, r), 9, r} = {b, r)} U {9 n} 


3 a. (O, O), (O, A), (O, B), (O, AB), b. (O, O), (O, A), (O, B), (O, AB), 
(A, O), (A, A), (A, B), (A, AB), (A, A), (B, B), (AB, AB), 
(B, O), (B, A), (B, B), (B, AB), (A, AB), (B, AB). 


(AB, O), (AB, A), (AB, B), (AB, AB). c¢. (O, O), (A, A), (B, B), (AB, AB). 


4 S = {r, br, gr, bbr, ggr, bgr, gbr, ...} 
{x|x =r or x = c,c, + c,r} where c; = borg 


§ a. S={0.1,2,...} b. S=[0, 0) = {t|t > 0} 
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SSQSeeeerseseysy 


By 
2 


Sees SER 


S=[0, 1] ={x|O<x<1}} 
S = [0, co) = {t/t > 0} 


. Yes. p, + p2 + p3 = 1, pp 20 
- No. pi + p2 + p3t+ Ps >i 


. 1/9 b. 1/73 €. 5/9 @. 1/3 @. 2/9 F. 5/9 
. 9/16 &. 1/4 ©. 1/8 

. 1/3, 1/3, 1/3 B. 1/4, 1/4, 1/2. ©. 6/11, 3/11, 2/11 
1/4 8. 15/16 6. 3/8 d. 5/16 

S = {(t,9, (9), (6, 0, (4, 9, (@, a), (0), (6, 9, (6, a), (6, 0}, 2/3 
a, 2/3 b. 23/30 e. 7/30 d. 9/10 

a. 7/8 8. 1/8 . 

8.08 8.03 e. 02 

a.02 6.0.2 6.03 d. 05 

319/420 

@.3/5 b. 1/2 ©. 3/4 d. 3/10 #. 3/5 g. 1/2 
4.3/5 b. 3/5 @. 3/5 dd. 9/25 #. 3/5 g. 3/5 
a. 5/14 &. 15/28 ©. 25/28 d. 3/28 

a. 1/56 &. 15/56 @. 5/28 d. 3/8 

1/3 

. 12/51 @. 2/15 

. 1/5 b. 34/105 

. 0.66 &. 0.1212 

. 1/13 &. 53/715 e@. 5/53 

. 5/12 8. 4/5 

. 29/1000 b. 10/29 

. 43/80 &. 28/43 

. 02 b. 1/3 


® ® © & && 


&® ® +» ® ® ® ® & 


27/50 
a. 25/64 b. 15/32 ©. 55/64 ff. 9/64 

a. 47/60 b. 3/5 

No. P(A, A Az A Ay) = 0 ¥ 1/8 = P(A,)P(A,)P(A3) 
a. 26! b. 7,893,600 ©. 11,881,376 

a. 6,760,000 b. 3,407,040 cc. 1,514,240 

72 
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57 40 
52 a. 0.504 6b. 0.496 
53 


sea. (2) 0 (3) 0 (MN) 
a(t) & (EM (a) 


57 a.8 b. 32 ©, 27771 


58 a. (3) b. 1/7 ©. 7! d. 576 
59 24,360 


60 10! — (2X9)! 


61 a. 365". -3g5P, -c. ee d. 0.5073 


62 a. 27,720 b. 27,720 
63 (26/9 D164) 

64 Same as 63 

65 a. 60 Bb. 13 ¢. 170 
66 (60 )/(151(201)(25 1) 


87 a. 3? b. (O/B) 

68 a. 11,550 b. (12!(31)/14! 

69 a. 126 b. 0.0397 ©. 3024 

77 a. 0.9722 b. 0.6475 

72 a. No b. Yes oc. Yes. B,D, and C,D 
CHAPTER 2 


7 a. y | 2 3 4 5 6 7 8 
fiy) |} 116 1/8 3/6 1/4 3/16 1/8 1/16 


FY(y) 
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w 0 1 4 9 


f,(w) | 1/4 3/8 1/4 1/8 


Fw) 


F(x) | 112 1/4 512 712 3/4 1192 1 
ad. 7/12 e. 1/2 
3 a. 


x o 1 2 3 
fe) | 4121/4 1/4 BD 
4 f@=(x—D/l0 if x =2,3,4, or 5 

a. k=8/7 b. No 

7 a. c= 1/33 


Oy 


0 x <0, 
8/33 O0<x<il 
5/il t<x<2 
b. F,(x)=< ll 2<x<3 
26/33 3<x<4 
1/ll 4<x<5 
1 5 <x 
e,. 4/il 
8 a. f(x) = (1/2771; x =0, 1,2,... 8. 0,000488 e. 2/3 
9 a. Yes b. Yes ec. No 
77 $3.50 
12 a. 12 b. 3/5 
73 a.k>0 b. F(x)=1—xi<x<0o e.k>tl 
14 a@.No 6. Yes. c. No 


15 a. f(x) =(x + 1/8; —1i <x < 3, zero otherwise 
b. f(x) = A?xe7*; 0< x < «, zero otherwise 
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0 1/2 1 3/2 


c. 1/4 d. 3/4 e@. 3/4 Ff. 90 
18 a. F(x) = x?/9; 0 <x <3, zero ifx <0, oneifx >3 
b. 4/9 ©. 1/4 d. 3,/2/2 @.2 #¥. 1/2 
719 a. 5/4 
20 a. 3/2. b. flix) = AL—1/x?) ift<x<2 
0 x<0 
22 FAx)= = O<x<l 
1 l<x 
a 
(+ 4e4) so 
F(x) = eae O<x<1 
1+4e7! — 4e-* ee 
t+4e7! 7 
ae oe e! 
23 a. 30/8 6. 172/64 e. 21/2 
24 a. 3/4 b. 3/80 ec. 3/(r +3) dd. 25/4 
25 a. No b. Yes c.k<1 
26 @a.19 6.129 o@. Y =35X —40, 26.50, 1580.25 
27 a@.1/2 b. 2 oe. 3n/10 
37 a. Bound = —7/5; not useful 6. Bound = 2/5 e¢. Exact probability = 7/8 
32 @. 25/8 b. 55/64 
34 a. &. 1/4 
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| 36 a. M,t)=e"*(1—8) b. —1,2 
37 6, 13 
38 a. Y=35S—10c b. 3 


CHAPTER 3 

7 0.008 

2 @. 0.000977 &. 0.0439 

3 + 0.0439....6. 0.1209 

4 1-—4q?+3q* b. 1-@? 


. Two engines safer if q > 1/3; four engines safer if q < 1/3; equally safe if g = 1/3 
. P(he wins) = 0.5177 > 0.5 (good bet) 
. P(he wins) = 0.4914 


. P(at least one 6 in six rolls) = 0.6651 
P(at least two 6’s in twelve rolls) = 0.6187 


Hi 
oF 8 6 ® o 


6 a. 0.2182 -b. 0.6292 ©. E(X) = 4, Var(X) = 32 
8 a. 0.2007 b. 0.2151 

9 a: 3/10 b. 2/3 

70 a. 0.03993 _b. 0.03993 ©. 0.9583 J. 0.0417 
77a. 0.1517 -&. 0.1759 

12a. 0.0465 b. 0.0465 ©. 0.9494 a. 0.0506 

13° a. 0.09 b. 0.1(0.9)*-! ©. 0.729 d. 10 
14a. 1/4 b. 15/16 

15 a. 0.01722 b. 0.999 e. 30 


16 a. ("5 ‘josrosrssx=4 5,6,7 b. 0.2765 c. p=1/2 d.x=6 


17 a. 0.00729 6. 0.99144 e. 0.271 
20 a. 0.0335 . 0.8385 
27 a. 0.090 b. 0.220 e. 0.217 


22 0.0242 
23 a. 0.1234 b. 0.1247 
24 0.677 
25 0.1008 


26 ~— a. 0.8009 _b. 0.1493 

28a. 3/5 b. 3/5. ©. 1 ~ 1/(10k?) 

29a. fy) = (41 — 2y)/400 if y = 1, 2,..., 20 
b. 19/8000 ©. E(X) = 21/2, Var(X) = 133/4 
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32 a. F(x) = (x — 50)/25, 50 <x < 75, zero if z < 50, one if x 2 75 
b. 2/5 .e. 125/2.. d.. 625/12 

3301/3 

34 a. UNIF(0, 5) b. 2/5 

36a. 24 b. 3, /n/4 

37 a. 0.353. b. 0.214 ©. 20 days 

38 @. 0.122 b. 0.0028 

39 0.2636 

40 a. 0.861 b. 0.333  ¢. Sameasa(nomemory) d. 10,000 

47 my =1 

42 0.5768 

43 OT (1 + k/B) 

44 a. Oe —1)_ Bb. 267 /[(« — 1X — 2) 

45 — -E(X) = 50, Var(X) = 10,000 

46 =a. 0.362 -b. 0.967 ©. 300,/n_ d. 3(100)*(32 — 3n) 

47 ~— a. 0.0183 ©. E(X) = 5,/x, Var(X) = 100(1 — 2/4) 

48 a. x, = (1 —p)**— 1) 
b. m= 10(,/2—1) : 

49 a. 0.846 b. 0.0377 ©. 20 days 

$7 a..0.937 .6..0.688..c. 0.341 d. 0.20 6..0.38 F. 1.96 

&2 @. 05 b. 0227 ©. 0.290 d. 3.822 9. 0.658 

53 a. 0.816 b. 5.88 

54 a. 0.8413 Bb. 0.9104 .¢..0.8413 d. 16.58 

55 a. 0.6366 &. 324.36 

56 a. 12 b. 25 

57a. E(Y) = 350p — 100 b. p>2/7 ©. No 

58 e@. 1002.857  d. 1001.8 

59 d.6 @. 6 

CHAPTER 4 


7 


625° 
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1 fn 1/12 0 0 0 n) 0 
2 0 1121/12 0 0 0 0 
3 0 0 112-442 0 0 0 
4 o 0 0 1/12 1/12 0 0 
5 0 0 0 0 W/12 2s 1/12 0 
6 0 0 0 0 0 i420 1/12 


3 a. 0.0399 &. Sameas(a) c. 0.000609 ad. 0.0793 e@. 0.0153 


1 Ot sraraa 


g. 0.9583 fA. 0.0417 7. 0.00005541 


4\(4\(4 40 2 
j. OY ies 0<x,O0<y, 0<z4,x+yt+z2<¢5 
x/\y/\2/\5 —x-y- 5 


4 a. 0.0331 @. 0.0666 
5 a. 0.0465 6. Sameasa c. 0.00089 dd. 0.0922 oe. 0.0191 


“hy 


: (Sjoraraanars x =0,1,2,3,4,5 


6 a. 0.000382 . 0.00344 ©. 0.00153 


12! Sr 
‘ Z ks 1 6)r1tx3tx5 1 2 12—-x,~-x3-~x5 
g RET a (URE Er ME ai (1/2) 
7 =‘1/18 
8 a.e* . Both POI(2) c. Yes 
9 4. x, 1 2 3 
f,(x,) | 1/4 14/45 79/180 
x, 1 2° 3 
f,(x,) | 5/36 19/36 1/3 
b. No..c. 101/180 od. 25/36 
Oey 13 ) 
9 sie Es 
10° af yee Os gS 0 Fete? 


() 
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71 


12 
14 


75 
16 
17 


18 


x 0-( 2 M2): a(t 


d. No. e. 0.668 


ei *] 
f. PY = y|Xe 4] aL) A Al- y=0,1 


rt ie (*) , 
1 
g. PLY =y|X =x]= 


0 ifz<0 
0.059 fO0<z<1 
0.441 ifl<2<2 
i if2<z 


h. PIX+Y¥<z]= 


2! Sn eC RaS 
Wieoe— yl (1/4)? 


a. f(x, y)= 
2 par Bes 2 
ce. fiX)= : (1/4)"(3/4)7 "73, f(y) = ; (1/4) 


d. No @e@. 2/3 
F(l, y) 


fF, PLY = y|X = 1] =~ = (2/3(1/3)'";, y= 90,1 


Si) 


g. PLY =y|X=x]= (3 “aanyse 0<y<2-x 


y 
No 
a. Both EXP(1) 


b. F(x, yy=(1—e *Xi —e-”) ifx >0 and y > 0, 0 otherwise 


ec.e? d. 1/2 @. 3e7? Ff. Yes 


a. f(%,, x2) = 21 + x,)-2(1 + x,)"3 fx, >0 and x, >0 b. 1/3 


1/3 
a. fi(x,) = 3(1 — x4)?3 O<x, <1 
b. f{x2) = 6x1 = x2); Oi x, <A 
©. fi, (Xp X2) = O(1 — x2); 0< xy <x, <1 
a.f(x%y=xty; O<x<l and O0<y<l 
b. 1/8 ©. 1/2 -d. 1/24 @. 19/24 
23/3 if0<z<1 
fF. PIX + ¥ <z]=42? —23/3-1/3 ifl<z<2 
1 if2<z 
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19 a.k=2 
b. f,(x) = (1 + 3x\l — x) if 0<x <1; f,(y) =3y? ifO<y<1 
0 ifx <0 or y<0 
xy>+x*y—x? if0<x<y<l 
c. F(x, y)=<y? ifx>y and 0<y<1 
x+x?—x? ify>1l and 0<x<1 
1 ify>1 and x>1 
2(x + y) tes 
: =—————-- ifx<y<l 
9. SOI = Tapa ets? 
2 
eo. fal = ito<x<y 
21 a. fix) =(2/3(x+1) if0<x<1 


b. fly) =1 if0<y<1 
ée.fVYiIy=1ifO<y<10<x<i1 
d. 4/9 @. 73/162 -f. Yes 
22 Be f (X%4, yy 005 Xp) = 3X 1X20 x,)? ifallO<x,<1 
b. 1/8 c. (1/8)" 
23 @. f (X14, X25 2545 Xq) = WK xq +++ X,)exp(—x} — x3 — +++ — x2) ifallO <x, 
b. 1-e 4 ee. (Le 4 


6(x,-—x,) if0<x,<x,<1 
2 ‘ 1X3) = ; 
e a. F(a 3) 10 otherwise 
Uf(x3— x1) if0<x,<x,<x,<1 
© fC lta Sah ‘; ee ane a 
2x3 if 0<x, <x, <x, <1 
@: [Oy aly) = ‘. otherwise 
27 a. 
b. 
ec. 0.72 
28 a. Yes f(x, y)=g(x)h(y) over (0,1) x (0,1) Bb. 1/6 
29° a. No f(x,y =0<f(Ofly) if0<1-x<y<1 
b. 1/27 e. f,(x) = 12x(1—x)? if0<x<1 
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30 a. fi(x) = 30x71 — x)? if0<x<1 
b. f(y|x) = 2/1 — x)? if0<y<1—x ec. 0.96 
2 
37a. 5/48 bb. fsa) = = if0<x,<1 
2 x4 +X, i 
C. f(x, |x.) =z OF 0 < x, <x, <1 
3.2343 
CHAPTER §& 
7 a. 15 b. 63 
2 a. 720. b. 23.04 
3 a. 2/15 b. -2/75 ©. —2/3 d. —2/5 
e. —2/15 f. —2/15 g. 3/25 
4 a. 4/3 b. 12/5 ©. 16/5 d. 0 
5 a. 7/12 6. 7/6 ©. 1/3 d. —1/24 @. E(Y|x) = (x +2/(6x +3); 0<x<1 
7 a.7 ob. 116:-¢.-16 d.-20 
77 a. fix) = 6x(L — x); 0<x <4 
b. f(y) = 3y?; O<y<1 
ce. 1/40 d. ./3/3 @. fiyiy=W-dix<y<1 fF @4F+H?2 
72 a. 1/6 b. 1/3 ¢. 1/9 4.0 6. 1/3 
1 
13 a. flyls)=> 0 <y<2x b. HYY|x) =x ©. 1/2 
7s a. 12/5 b. 6/25 
17 4.1 6.2 
78 ‘115/88 
19 a. 144 b. 1.0944 
20 a. 3/2 -b. 25/12 ©. 6 
27 M(ty, t2) = 2/[(1 — t2)(2 —t, — t,)]; t2 <1, t, +t, <2 
22 M(t, 2) = [1t2 — 1) -— Wty +t — DV/ty3 2 <1,t, +t, <1 
4 
CHAPTER 6 
7 a. ffy=1; O<y<1 


b. fy(w) = 4(In w/w; l<w<e 
e. ffz)=4e*; -—w<z<0 
dl. fy(u) = 0.5(1 + 12uyu74?; O<u< 0.25 
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2 a. fy)=4y?; O<y<l 
b. fw) =1/w; et h<we<l 
c. f(z)=1fl—2); O<z<1-—e7! 
d. fou) =2(1— 4u)y74/?; O<u< 1/4 
3 a. X =2nR; fy{x) = 6x(2n — x)/(2n)?; O<x<2n 
b. Y=nR?*; fly) = 30? — y'?yn*?; O<y<n 
8 y=x* or y=1—x* 
7 a. y=-—In(i-w) 
6b. w=x if B(x — 1;3, 1/2)< w < B(x; 3, 1/2); x =0, 1, 2,3 
10 a. fry) = exp(—y); y>0 
b. Fy(w) = 1/2 ifO0<w <1, one if 1 < w, zero otherwise 
77 Y ~ BING, 1 — p) 


+r—1i 
12 son=(?*"> Jou ~pr: y=0,1,2,... 


13. fy) = y"2/24 if 0 < y <8; y'!2/48 if4<y <16 
14 a. F,(w)=1—-(4+ 2wer?"; w> 0 
b. fy, lu, v) = A(v/u?)exp(—2v(1 + 1/W); u > 0,0 > 0 
ec. fou) = 1/1 + ur u>o 
78 Y ~ POI) 
16 a. fo, pu, v) = Iu»); b<o<u 
b. fou) =(n u)/u?; 1 <u 
17 a. fly) = y exp(—y’/2); y> 0 
b. f,w) = w 71+ w/a; w> 0 
18 a. fs f(s, ) =e 8; O<t<s/2 
b. ft) =e O<t 
c. fs(s) = e Se? — 1); s>0 


-gi(Vi) = n/yi**s v1 > 1 
: IA Vn) = nY, 7 1 ile a deer 1< Vn 


22 ~— 0.8508 
pe! k 
23. M\t)= (5) ; Y~NB(, p) 
1 — ge 
24 a. M(t) =(1—28)7!° b. Y ~ GAM(, 10) 
25 a. X,~POI10) b. W ~ POI(15) 
27 a. LOGN(Y. u;, Yo?) &. LOGN(S a; u,, ¥. a?) 
c. LOGN(, — uy, 0? + 03) 
29. V1 1°, Wy) = MY WAL SY <<, 
b 
Cc 
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37 


32 


33 


35 


(2r — Dy 1" 


@. GAY.) = (e—Diey yp> 


a.g(y)=ne"; O<yy 

+ Gn Yn) = ne "(1 —e "4s O<y, 

ce. fpr) =(n— De" —e "7; O<r 

: n!} is 

Fs Gy, +++» Yp) = ——, EXP (- y nary): O<yy<-<y, 
(n—r)! 1 

a. gi(yy) = Se"; O< yy 

b. gs(ys) = Se "51 —e8)*; O<ys 

©. 93(¥3) = 30e7*(1 —e-)?; O< V3 

d 

a 


in 


- 9s) = (1/0)e-*"9; = 1/(1/0, += + 1/8,); yy > 0 
: Gi) = 1 —(1 — py; yar 1,2, 3, see 


n 


b. Gly) = (")c = (1 = PPT ps y= Dees 


i= 


c. Gn) = fl _ (1 ad py)"; Ya = 1, 2, a0) 
@. PLY, <1] =G,(l)=1-(1— py 
a. Mj) =(i— 62)"! bb. Y ~ DE(6, 0) 


CHAPTER 7 


7 


0 ify<1 
i oin= {ayy ee 
5. Degenerate at y = 1 
- {0 ify <1 
a= 1) ay ae 
a. Nob. Yes, Gy) = exp(—exp(—y)) 
a. Degenerate aty=1 6. No limiting distribution 
0 ify <0 
7 = Feat ity >0 
N(O, 1) 
a. a= 0.733, b=1.193. b. a= 0.634, b = 1.032 
0.8508 
a. 0.1587 b. 0.1359 
a. 0.9394 b. 11.655 
a. 90 b. 122 oc. 92 
a. 0.4364 
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0 if y<0 
d. Gy=4y? if0<y<1 
1 


if y21l 
e. N(B/2*, [B/(62*/)}?/n). #. B/21/? g. WEI(1, 6) 
17 m= p,c= o./n/2 
0 ify <0 
19 GO)= fay fy 
23 N(O, ¢?), N(O, 1/c?) 
24 a. 0.2206 . 0.6044 
25 a. Type I (for maximums), a, =.1,.b, =Inn 
Type I (for minimums), a, =.1,.b,= —Inn 
ad. Type I (for maximums), a, = 6n'"(e1*" — 1), b, =0 
Type If (for minimums), a, = 0/(kn), b, = 0 
28 Type U (for maximums), a, = tan[x(1/2 — 1/n)], b, = 0 
CHAPTER 8 
7 0.987 
2 a. 0.39 &. 0.0043 
3 a. 2U/n—5(W — U*/n(n—1) b. Win c. ¥ 
5 - a. GAM(100, 10) 6. 0.95 e. 12 spares 
7 approx. 0.90 
8 No 
70 0.685 ; 
47 4. 0.95 b. 0.95 
72 approx. 0.95 
12 4. 0.9772 b..0.921.. c..0.921..d. 0.988. e..0.95 
15 a. N(0, 207). b. Nu, 50?) c. t(k—1). d. x%(1). e@. t(k — 1) 
f. ¥7(2) g. unknown A. «(1) i. F(t, 1) j. Cauchy k. unknown 
I. tk) m. Pn+k—1) on. S+7) 0. (1) p. Fn—1,k—1) 
16 a. 0.6898 6.0.05 .¢. 0.75 
17 a. 0.85 b. 27.14 ¢. 0.90 d. 0.19 e. 0.256 Ff. 2.50 
g. 0.045 fh. 0.975 
18 4. 0.144 b. 0.95 ec. 0.05 a. 0.90 @e@. 0.592 
23 0.9929 
25 b. 0.924 e. 13 
27 X5.95(10) = 18.31; approx. = 18.29 X3.03(10) = 3.94; approx. = 3.93 
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CHAPTER $9 


7 


a. X(1—X) b. I(X—1) ec. 2/X 


3 a. —n/Y InX,; b. —1+n/Y In X; ©. 2/X 
& Kin 
7 a. Xb. X?-X ec. (1—1/X} 
1. a. V belle; + OJ =nf3_ b. 6 = 1.404 
15 a. c=nf(n—1) b. [n?/(n —1)]p(1 — p) 
N N 
e. Y XfnN), Y (n/n — Y](Xi/nl — Xi/n)/N 
i=l i=1 
19 c=(n+.1/2(n— DJ 
21 a. p(l—p)y/n b. pl—pyl—2p)/n oc. X 
23 a. Yes b. Yes 
27 a. Estimator: 6, 6, 63 6, 
2: 2 
Risk: (ae See 
9 9 
&. Max Risk: 1/2 5/8 5/9 1/3 (8, is minimax) 
c. Bayés Risk: 1/2 5/8 13/27 7/27 
Patt yoP ati. Prit2 
2g ge be | ute +2) 
33 a. pin b. pen c. X d.e7* e@. No fF. Yes 
35 N(p, p(t — p)/n) 
37 @. ARE = 1/2. 6. ARE = 2/n = 0.64 
b.(nt+1fB+h x) ¢. B+Y x)/n 
g, Hoso2n+2) 5 b+) x) 
— Se. em ee @. 2 Mn in 
2B + > x;) Xo,s0(2n + 2) 
CHAPTER 10 
7 si[( I] x: ' ifs =}. x,, zero otherwise 
ist 
3 T(n/2)/(n"/2s"2-1) ifs = )° x}, zero otherwise 
5 a. (ni x) Tans ifs =}. x,, zero otherwise 
i=1 
BD. f (x4, ..-5 X_ 5 8) = 8" exp(—F. x, /OXT] x,); all x; > 0 
2 (Say, 
i=1 
9 a.S=¥ X? b. Only when n = 1 
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73 S,=[] X,, 8, = ]] d-X) 
i= 


15 p = n/s. Same as usual MLE 
17 ce. X1,, — 1/n —In(1 — p) 
2 a. (YX (LX MMn- 1) b. (LX)? — XI/Enln — 1] 
23 a, K+: 3(1.645) b. O(L/n(c — X)/03./n — 1) 
25 a. (—1/n(V In X) b. (1 —n( In X) 
27 a. S?= [JY X73 — (YX)? /n]n— 1) 
b. X + 1.645cS with c = M(n — 1)/2)/f,/2F(n/2)] 


29 a. Use regular exponential class b. X oc. [| X,;=X" 
t=1 


31 a. n/Y Ini +X) b. Vint +X) ¢. 1f(nO?) d. (1/n) Yin + X) 
e@. N(6, 6?/n) for 6, N(Q~', 8~7/n) for 1/8 #. (n— 1)/¥ In(1 + X) 
33 a. ce] 5 Xin eH (n cae na b. S Xtey F (n 2 NX yn 
f=1 i=1 


t 


CHAPTER 717 
7 a. (18.1,.20.5). 6. lower: 18.3, upper: 20.3. ¢. n = 25 for length = 2 
. (17.9, 20.7) -e. (4.68, 33.39) 
. 14.40 Bb. exp(—t/14.40) 
. Q ~ EXP(1/n) 
» (Xtin + (L/n)in(a/2), x4., + A/n)In(l — @/2)) with a = 1 -y 
. (161.842, 161.997) ; 
9 (0.81, 2.88) 
11 (0.04, 0.21) 
13° a. (2n/x2-gy,(2nk), 2nk/x2,,(2nK)) Bb. (5.62, 15.94) 
17 a. p, =1—y"  b. 0.0209 
21 (1.35, 2.10) 
23 a. (y29(2n + 2/[2nX + BY, ¥2—a)al2n + 2)/[2nk + BY) ©. (0.119, 0.326) 
29 oe. (Ra 2paajg RJ R$ 21 -a2R//n) ad. (0.64, 2.73) 


io] 
Oo Fr & %& O& 


CHAPTER 12 
7 a. a= 19.59, b = 20.41 
b. ForA:B+1 For B: f= 0.0091 
c. ForA:f8=0.0091 For B:f=1 d.a=010 e. B= 0.0091 


ANSWERS TO SELECTED EXERCISES 635 


3 @. 29 = —2.236 > —2.326 Can’trejectH, b. B=0.1515 ©. n=24 
d. to = —1.12 > —2.539 Can’t reject Hp 
@. Vo = 33.78 < 36.19 Can'trejectH, f. n= 50 


§ n = 32 from Table 8. Appendix C (d = 0.8, 1 — B = 0.95) 
to.995(31) = 2.7454 (by interpolation in Table 6) 
C = {tolto < ~2.7454 or ty > 2.7454} 


7 a. z= —1.79 < ~1.28 Reject Hy 


b. B(6; 20, 0.5) = 0.0577 < 0.10 Reject Hy 
ce. n{0.2) = 0.9133 d. p-value = 0.0577 
9 a. —2< —1.746 Reject Hy 
b. —2<~1.75 Reject Hy 
ec. —2< —1,.86 Reject Hy 
d. 0.29<08 -Can’t reject Hy 
17 a. Reject Hy) ifx 20.95 b. 0.0975 
ec. Reject Hy TT] x; 2c where ATi X,2cld= i| = 0.05 
73 a. Reject Ho ifx>4 b. power =05 
917 a. Reject Hy if ¥. x?/o2 > x? _,(n) 
b. n(o) = 1 — HU(o2/o7)y?_,(n); n] _¢.. 0.968 
19 a. Reject Hy ifX < up + 24/s/n 
b. Reject Ho if & > uy + 24_4//n 


27 Reject Hy if Y. x, > c where P[Y. X,>cl0=6,] = 


23 Reject Hy ift = }° x, > ky where ko is the smallest k such that 
Bk —-1;n, pp) >1—a 


27 a. Reject Ho if —2n[In (X/09) — X/89 + 1] > x?_{1) 
b. Reject Hy if 2n&/0, > 2 _,(2n) 

29 Reject Ho if Xun < 9901" 

37 ——-Reject Hy if —2n[In(6,/6) + (65/6 — 1)] > x2_.(1) 

37 a. k§ = 0.1053, kf = 18 
c. Reject Hy if )) x? >2.47n + 5.07 Accept Hy if ¥; x? < 2.47n — 6.50 
d. E(N\|o =1)=4, E(N[o=3)=1 e@. Yes. Rejects withn = 4 

39 a. Accepts Hy withn=14 b. 12 €.6 


CHAPTER 13 


7 x? = 4.76 (without continuity correction) 
x? = 4.30 (with continuity correction) 
4.30 > 2.71 Reject Ho 
For a one-sided test use binomial test 
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3 a. 2.67<625 . Can't reject Hy 
&. 20> 6.25 Reject Ho 
ce. 1818>4.61 Reject Hy 

§ a. 5>461 Reject Hy 


b. 0.78 < 4.61 Can’t reject Hy 
7 0.090 < 5.99 _Can’t reject Hy 
8 1.54<7.78 Can't reject Hy 
77 17.93 > 9.49 Reject Hy 
@. 16.504 > 13.28 Reject Hy 
6. 0.24<9.24 Can't reject Ho 
1§ a. fp=96.70 235.36>15.51 — Reject Hy 
&. pp = 35.65 4.42 < 6.26 Can’t reject Hy 
Gc. p=96.70  & = 37.46 10.31 < 12.02 Can't reject Hy 
a7 18.05 > 9.49 
21 b. CM =0.058 (1+ 0.2/,/23K0.058) =-0,060 < 0.102 Can't reject Hy 
e. D=0.151 /23(0.151) = 0.724 < 0.803 Can't reject Hy 


CHAPTER 14 

7 4. t = 10; B(10; 20, 0.5) = 0.588 > 0.05 Can’t reject Ho 

b. t = 2; B(2; 30, 0.25) = 0.091 < 0.10 Reject Hy 

3 Based on large-sample binomial test, z = 0.85 < 1.96 Can’t reject Hy 
Based on large-sample binomial test, ®(— 5.14) < 0.01 Reject Hp 
7 a. t = 22; B(22; 60, 0.5) = @(—1.94) = 0.0262 < 0.05 Reject Hp 

b. (5.22, 5.25) (i = 24,j = 37). e. 5.13(k = 10) 

9 a. 35.39 (k = 11) 


&. (22.40, 112.26) based oni = 9 and j = 30; other intervals are possible 
provided j —i= 21 


17 a. 1.349 < 1.383 Can’t reject Hy 
&. t = 2; B(2; 10,0.5) =0.055< 0.10 Reject Hy 
73 a. t = 4; B(4; 12, 0.5) = 0.1938 > 0.10 Can’t reject Ho 
6. t=10< 21 Reject Hy 
Normal approx. z = —2.27 < —1.28 Reject Hy 
95 a. U,=5< 27 Reject Hy 
Normal approx. Up 95 =.28.2 


eH 


&. t = sum of ranks of positive differences = 3 
Two-sided test would reject at « =.0.01 («/2 = 0.005) 
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17 a. r= 0.165; t = 0.472 < 1.860 (/2 = 0.05) 
Can’t reject H, : independence 
b. R, = 0.215; p value between 0.280 and 0.268; 
t = 0.623 < 1.860 Can’t reject Hy : independence 
27 R, = —0.259; ¢ = 1.229 > — 1.383 
Suggests moderate decreasing trend, but can’t reject at « =.0.10 
23 t = 4runs; P(T < 4| Ho true) = 0,001 
p-value < 0.002 (for two-sided test) 
CHAPTER 75 
1 b. By = 1.182, 8, = —1.315 e. 0.76  d. SSE = 0.0184, 47 = 0.0023 
2 b. Bo = 0.628, 8, = 0.608 c. = 3.06 @. & =0.194 
3 b. By = 1.182, B, = 0.755 
9 a. f= 93.90, é = 37.13 Bb. A = 13.14, 6 = 24.73 
70 p= 93.96, & = 36.77 
718 a. § = 0.00245, 6? = 0.00024 b. &? = 0.00028 
@. (0.0020, 0.0029) -d. (0.00013, 0.00123) 
20 B = 0.00296 
37 f=2217>124, Reject Hy 
32 r = 0.672 
33 = 3.391 > 1.761, Reject Ho 
35 a. f= 18.97 > 3.75 Reject Hy. &. f = 0.252 < 4.60 Can’t reject Hy 
- CHAPTER 16 
7 ce. (1/10)?> -d.22,5 e@. 23 (rounded up)...f. 47 (by normal approx.) 
3 6 = (n/) Xn 
§ No (convergent integral) 
7 a. 600hours 6. 0.916 c. n=7 
9 a. 100/6 = 16.67. b..0.5488..¢. 1139 hours a. 6/100 
77 0.5714 
713 a. 77(38). b.. 688.63... e.. 142.52 
17 a. 1.177 b&. 133.33< 140.85 ~* Reject Hy 
ce. (0.982, 1.420) a. 0.346. e. 0.05 year or 2.6 weeks 
19 a. 6 = 0.482,8 = 2.07 b. = 4.42, 6 = 83.1 
23 a. 0.4335 -b. 0.6288 c. k= 6; PLX <6] = 0.8893. d. 4, 32 
e. Y =min (X, 3); EY) = 2.652 
f. W=X,4+ X,~ POI); Z = min (W, 6); E(Z) = 5.650 
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25 a. X = [1(211) + 2(93) +.3(35) +.4(7)1/575 =.0.92 (neglecting y25) 
Another approach: If p= PLY = 0] =e", then p = 229/576 = 0.398 and 
f& = —1n(0.398) = 0.92, which is the MLE based on binomial data for p. 
b. my: 229. 211... 93.35... 7 
é,: 229.5. 211.2. 97.1 29.8. 69 


Be 
c. 0.602 

27 a. 2.5 6. 0.958 c. 461 

29 a. 0.9887 b. 0.9834 ¢. 0.9955 

37 0.6321, 0.0803, 3/2 


35 a. § = 3.583/2 = 1.7925 


1 
b. fx) = (* i rma +y**;x=0, 1)... 6. 0.6959 


d. 2) = 3.583, 29(1 + 9) = 10.011 
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Bayes estimator, 323 
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Bayes rule, 26 
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Bernoulli distribution, 91 

Bernoulli law of large numbers, 237 
Bernoulli trial, 91 

Bernoulli variable, 91 

Best linear unbiased estimators, 502 
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Binomial! distribution, 92 
normal approximation of, 240 


Poisson approximation of, 105, 240 
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Binomial expansion, 36 
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Boole’s inequality, 16 
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Cauchy distribution, 127 
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Central limit theorem, 238 
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Chebychev inequality, 76 
Chi-square distribution, 268 
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Conditional distribution, 153 
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pivotal quantity method, 362 
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Confidence limit, 360 
Confidence region, 362 
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Conservative test, 405 
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Continuous random variable, 64 
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Correlated random variables, 178 
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Correlation coefficient, 178 
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Countable intersection, 5 

Countable union, 5, 593 
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Coverage, 474 

Covariance, 174 

Covariance matrix, 516 

Cramer-Rao lower. bound, 305 

Cramer-Yon Mises test, 458 
table of critical values, 613 

Critical region, 391 

Cumulative distribution function, 58 


Degenerate distribution, 77 

Degrees of freedom, 268 

De Morgan’s laws, 5, 590 

Dependent events, 27 

Dependent random variables, 150 

Derived distributions, 267 

Deterministic model, 1 

Difference of sets, 588 

Dirichlet distribution, 212 

Discrete probability density function, 
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Discrete random variable, 56 
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Discrete uniform distribution, 107 
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Distribution function, 59 

Double exponential distribution, 127 
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Efficient estimator, 308 
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Estimate, 290 
Estimator, 290 
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BAN, 315 
Bayes, 323 
best asymptotically normal, 315 
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efficient, 308 
least squares, 501 
maximum likelihood, 294 
moment, 290 
minimax, 321 
unbiased, 265, 302 
uniformly minimum variance 
unbiased, 304 
Event, 4 
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null, 5 
sure, 5 
Exhaustive sets, 23 
Expectation, 61, 67 
conditional, 180 
Expected value, 61,67 
Experiment, 2 
Exponential class, 347 
regular, 347 
tange dependent, 350 
Exponential distribution, 115 
Exponential type, 252 
Extended hypergeometric distribution, 
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Extreme-value distribution, 560 
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F distribution, 275 

tables of percentiles, 609 
Factorial moment, 82 
Factorial moment generating function, 
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Factorization criterion, 339 
Failure rate function, 541 
Fair game, 62 
Finite collection, 5 
Finite intersection, 591 
Finite multiplier, 177 


Finite population, 177 
Finite sample space, 3 
Finite set, 592 

Finite union, 591 


Gamma distribution, 111 
Gamma function, 111 
Gauss-Markov theorem, 517 
Gaussian distribution, 118 
Generalized likelihood ratio, 418 
General linear model, 515 
Geometric distribution, 99 
Geometric series, 99 
Goodness-of-fit tests, 442 
chi-square, 453 
Cramer-Yon Mises, 458 
Kolmogorov-Smirnov, 460 
Kuiper, 460 


Hazard function, 541 

Histogram, 162 

Homogeneous Poisson process, 106 
Hypergeometric distribution, 95 
Hypothesis testing, 389 


Independent events, 27; 30 
Independent random variables, 150 
Indicator. function, 341 
Indistinguishable objects, 37 
Intensity, 107, 571 
Intersection, 5, 588 

Interval estimate, 360 

interval estimator, 360 
Invariance property, 296, 298 
Inverse binomial sampling, 103 
Inverse transformation, 198 


Jacobian, 199, 205 


Joint cumulative distribution function, 
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Joint distributions, 136 


Joint moment generating function, 187 
Joint probability density function, 137, 
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Joint transformations, 204 
Jointly sufficient statistics, 337 
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Kolmogorov-Smirnov test, 460 
table of critical values, 613 
Kuiper test, 460 
table of critical values, 613 
K-variate normal, 520 


Laplace distribution, 127 

Law of large numbers, 246 

Law of total probability, 23 

Least squares estimates, 501 
Lehmann-Scheffe theorem, 346 
Likelihood function, 293 
Likelihood ratio tests, 418 
Limiting distribution, 232 

Linear combination, 267 

Location parameter, 124, 363 
Location-scale parameter, 126, 363 
Logistic distribution, 200 
Lognormal distribution, 199 

Loss function, 319 ; 

Lower one-sided confidence limit, 360 
Lower one-sided test, 397 


Mann-Whitney test, 483 
table of p-values, 615 
Marginal CDF, 148 
Marginal pdf of continuous random 
variables, 147 
Marginal pdf of discrete random 
variables, 141 
Marginal probabilities, 21 
Markov inequality, 76 ~ 
Mass points, 56 
Mathematical model, 1 
Maximum likelihood equations, 294, 
298 
Maximum likelihood estimate, 294 
Maximum likelihood estimator, 294 
invariance property, 296, 298 
large-sample properties, 316 
Mean, 61, 67 
Mean absolute deviation, 74 
Mean square error, 309 
Mean square error consistency, 312 
Measurement error, | 
Median of a distribution, 69 
Median of a sample, 218 
Median unbiased, 333 
Method of maximum likelihood, 292 
Method of moments, 290 
Minimal sufficient set of statistics, 337 


Minimax estimator, 321 

Minimum expected length criterion, 
361 

Mixed distribution, 70 

Mode, 69 

Modified relative frequency 
histogram, 163 

Moment of a sample, 291 

Moment of a random variable, 73 

Moment generating function, 78 

Monotone likelihood ratio, 413 

More concentrated, 303 

Most powerful critical region, 407 

Most powerful test, 407 

Multinomial distribution, 138 

Multiplication principle, 32 

Multiplication theorem, 19 

Multivariate normal distribution, 520 

Mutually exclusive events, 6, 7 

Mutually independent events, 30 


Negative binomial distribution, 101 
Neyman-Pearson lemma, 407 
No memory property, 100, 115 
Noncentral ¢ distribution, 400 
table of sample sizes, 616 
Noncentrality parameter, 400 
Nonhomogeneous Poisson process, 
574 
Nonparametric methods, 468 
binomial test, 471 
confidence intervals, 472 
correlation tests, 486 
Mann-Whitney test, 483 
rank correlation, 489 
sign test, 469 
tolerance limits, 473 
Wald-Wolfowitz runs test, 492 
Wilcoxon test, 477 
Normal distribution, 118 
table of CDF and percentiles, 603 
Nuisance parameters, 362 
Null event, 5 
Null set, 588 
Null hypothesis, 390 


One-sided confidence limits, 360 
One-sided tests, 397 
One-to-one transformation, 197 
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Order statistics, 215 
asymptotic normality of, 244 
joint pdf of, 215 
marginal pdf of, 216 

Outcome, 2 


Pairwise mutually exélusive, 7 
Parallel system, 29, 545 
Parameter, 289 
Parameter space, 289 
Pareto distribution, 118 
Partitioning, 39 
Pascal distribution, 99 
Percentile, 68 
Permutation, 34 
Pitman asymptotic relative efficiency, 
470 
Pivotal quantity, 363 
Poisson distribution, 103 
table of CDF, 602 
Poisson process, 105 
homogeneous, 106 
nonhomogeneous, 574 
Polynomial regression model, 500 
Population, 158 
Posterior density, 324 
Power function, 394 
Principle of least squares, 501 
Prior density, 323 
Probabilistic model, 2 
Probability, 9 
classical, 11 
conditional, 16 
density function, 64 
generating function, 82 
. integral transformation, 201 
mass function, 56 
model, 2 
set function, 9 
subjective, 9 
P-value of a test, 396 


Quadratic form, 521 
Quantile, 68 
interval estimates, 472 
test of hypotheses, 471 
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Random interval, 359 
Random numbers, 202 
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Random sample, 164 
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Random variable, 53 
* gontinuous, 64 

discrete, 56 
Randomization test, 482 
Randomized test, 411 
Range, 218 
Range dependent exponential class, 
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Rao-Blackwell theorem, 344 
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Rayleigh distribution, 116 
Redundant system, 29 
Regression analysis, 499 
Regression curve, 186 
Regression function, 186 
Regression toward the mean, 500 
Regular exponential class, 347 
Relative frequency, 7 
Reliability function, 541 
Residuals, 502 
Risk function, 319 
Runs test, 492 


Sample Correlation Coefficient, 529 
Sample mean, i61 
Sample median, 218 
‘*s Sample moments, 291 
Saraple proportion, 161 
Sample range, 218 
Sample space, 2 
Sample variance, 161 
Sampling distribution, 267 
Sampling without replacement, 19 
. Sampling with replacement, 29 
Scale parameter, 112, 363 
Sequential probability ratio test, 429 
. Series system, 28, 546 
et, 2, 587 
nction, 9 
étitheory, 587 
Shape parameter, 111, 116 
of gamma distribution, 111 
of Pareto distribution, 118 
of Weibull distribution, 116 
Significance level, 391 
Simple consistency, 311 
Simple hypothesis, 390 
Simple linear model, 501 
Size of critical region, 391 
Size of test, 391 


Skewed distribution; 70 
Slutsky’s theorem, 248 
Smallest characteristic value, 257 
Snedecor’s F distribution, 275 
table of percentiles, 609 
Spearman’s rank correlation, 489 
table of p-values, 617 : 
Standard deviation, 73 
Standard normal distribution, 119 
table of CDF and percentiles, 603 
Statistic, 264 
Statistical hypothesis, 390 
Statistical regularity, 8 
Stochastic model, 2 
Stochastic convergence, 233 
Stoctiastieally independent, 150 
Student’s t distribution, 274 
table of percentiles, 608 
Subset, 4, 588 
Sufficient statistic, 337 
complete, 346 
factorization criterion, 339 
jointly, 337 
minimal, 337 
Superefficient estimator, 315 
Sure event, 5 
Survivor function, 541 
Symmetric distribution, 69 


t distribution, 274 
table of percentiles, 608 

t test, 399 
one-sample, 399 
paired, 403 
two-sample, 403 

Test statistic, 391 

Test of hypotheses, 389 
critical region of, 391 
for composite hypotheses, 395 
for equality of means, 403 
for equality of proportions, 406 
for equality of variances, 402 
for independence, 450 
for the mean ofa normal 

distribution, 398, 399 

generalized likelihood ratio, 418 
most powerful, 407 
nonparametric, 468 
of randomness, 492 
power of, 394 
randomized, 411 


relationship of confidence intervals, 
415 
of simple hypotheses, 391 
size of, 391 
two-sample tests, 402 
unbiased, 416 
uniformly most powerful, 412 
Three-parameter gamma distribution, 
127 
Three-parameter Weibull distribution, 
127 
Threshold parameter, 125 
Tolerance limits, 473, 548 
Total probability, 23 
Transformations, 197 
Tree diagram, 23 
Trial, 2 
Two-parameter exponential 
distribution, 127 
Two-sided alternative, 397 
Type I censored sample, 222 
Type II censored sample, 221 
Type I error, 391 
Type I extreme-value distribution, 258 
Type If error, 391 


Unbiased estimator, 302 

best linear, 502 

uniformly minimum variance, 304 
Unbiased test, 416 
Uncorrelated random variables, 178 
Uniform distribution, 109 
Uniformly minimum variance, 304 
Uniformly most accurate, 415 
Uniformly most powerful test, 412 
Union, 5 
Uniqueness of MGF’s, 82 
Universal set, 588 
Upper confidence limit, 360 
Upper one-sided test, 397 


Variance, 73 

conditional, 182 

of sample, 161, 266 

stabilizing transformation, 388 
Venn diagrams, 588 


Wald-Wolfowitz runs test, 492 
Weibull distribution, 116 
Weighted least squares estimates, 538 
Wilcoxon test, 477 

table of critical values, 614 
Wilson-Hilferty approximation, 281 
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Notation and 


Parameters Continuous pdf f(x) Mean Variance MGF M(t) 
Student’s ¢ 
v+1 
r 
X ~ tv) ( 2 (4S)F 0 x * 
~ tly peeeracan, aan 
r{2 vn : y-2 
2 
Ved 25s. l<v 2<v 
Snedecor’s F 
r(5") a ; 
X ~F(v,, 2) hk (BRR 5 2v3(¥, + V2 — 2) x 
v v v Pom 2 
rf \ri2 2 ¥4(¥2 —2)"(v2 — 4) 
(Gg) 
VitY, 
vy, = 1,2,... «(14 3x) 2 2< vz 4<v, 
v,=1,2,... %2 
Beta 
T(a+b) ._ os a ab 
X ~ BETA(a,b xa xyr! Fo x 
a T(a)P(b) } at+b {a+b + i\(a+b)? 
O<9 O<x<! 
0<5 


*Not tractable. 
**Does not exist. 
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Notation and 


Parameters Continuous pdf f(x) Mean Variance MGF M(t) 
Weibull 
; Bog - tease 1 2 | 
X ~ WEI(0, B) — xh~1g7 (2/8) or +5) ) Pf r+=)—7 +5) * 
, 6° B B . &B 
0<6 — O<x 
0<f 
Extreme 
Value 
1: 126? 
X~ EV(6,n) goxP {L(x —1)/0] —exp [(x —n)/6]} n-yé 7 e"T(1 + 9t) 
0<@ y =0.5772 
(Euler’s 
= const.) 
Cauchy 
I 
xX a CAU 8, — ae eR ee 
(@.m) On{ 1 + [x —)/6]2} 
0<@ 
Pareto 
K 7] 67K 
X ~ PAR(O, ————__ — ——_____——. ae 
ARG Ol + x/ayr ped (k — 2)(e — 1)? 
0<6 : O<x l<k 2<K 
O<k 
Chi-Square 
2 | {2-1 -x/2 I ve 
x~ x (v) 25 (¥/2) x @ Y 2v (=z) 


v=1,2,... O<x 
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Notation and 
Parameters 


Binomial 
X ~ BIN(n, p) 


O<p<i 


= f{-— 


Bernoulli 


X ~ BIN(I, p) 


O0<p<il 
_q=i-—p 


Variance ; MGF M,(t) 
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Negative Binomial 
X ~NB(r, p) 


O<p<il 
a OY See 


r/p rq/p? 


1 — ge’ 


Geometric 


X ~GEO(p) 


O<p<l 
q=i-p 


I/p 


q/p 
1 — ge' 


Hypergeometric 


X ~ HYP(n, M, N) ( 


es OO eee N 
M=0,1,...,N 


M 
x 


X 


Poisson 


X ~ POK(u) 


O<p 


ite = 1) 
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Discrete Uniform 


X ~DU(N) 
N= 1,2, 


( g(Nt ir 


N+1 
2 


N 1-e 


*Not tractable. 


